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Conventional signal processing based on a parametric representation 
of signal spaces and, as a consequence, operations involving either an 
estimation of the parameter(s) given sufficiently many observations or a 
manipulation of the parameter(s) to obtain the desired signals needs an 
exhaustive account of the dependence of the output signal on the input 
signal in terms of criteria with global scope, if not in terms of closed 
form expressions. Alternative approaches, generally non-linear, based 
on grammatical formulations of signal spaces and operations have also 
been suggested in the literature to overcome some of the limitations 
attributed to linearity in conventional signal processing. 



VI 


Synopsis 


Several instances of signal processing, generally involving a sub- 
jective element in the processor —though not devoid of invariance, eg^ 
recognition of hand written characters, facial recognition, texture iden- 
tification etc, exist wherein neither a parameterization nor a grammat- 
ical formulation of signal spaces is feasible due to insufficient under- 
standing of the processes underlying the signal generation and/or the 
volume of data being grossly inadequate to establish input-output re- 
lationships. However, such situations can be described through finitely 
many prototype inputs for which the outputs are known eitlier com- 
pletely or partially. Processors of this kind have been realized through 
a class of hierarchical non-linear dynamical systems termed artificial 
neural networks wherein the processor belongs to a parametrically de- 
scribed space and the objective is to estimate (learn) processor param- 
eter(s) given examples of input-output association. 

Current neural network research exhibits a plethora of networks, 
each concentrating on representing specific types of input-output rf'- 
lationships with accompan3dng procedures to estimate the processor 
parameters given the examples of association. However, the issues 
of representation have not been adequately investigated with the con- 
sequence that no satisfactory criterion exists whereby one can decide 
on the architecture of the network necessary for a given situation of 
processor realization. In addition, little attempt is made at providing 
an axiomatic framework in which neural networks architectures and 
processor realization can be discussed 
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Overview of the Thesis 

I focus iny investigations on four key issues related to the representa- 
tion of signal processors ■with neural networks, each not unconnected 
with the others. Representation is interpreted, in this thesis, as a 
decomposition and/or synthesis of the desired function through ’basis’ 
functions that are not chosen a priori, but are synthesized to suit the 
requirements of processor realization: the requirements are specified 
through finitely many ’examples’ of the desired association. The ’basis’ 
functions are, however, not restricted to be dependent on the family of 
functions under consideration. 

The investigation is initiated by considering representation in iso- 
lated neurons from the perspective of preservance of input spaces in 
input-output associations: preservance of mappings is defined in terms 
of one-one correspondence, order preservation and preservation of reg- 
ularity. I establish the existence of weights, corresponding to isolated 
neurons, that accommodate a preservance of the collection of binary vec- 
tors in an Euclidean space and identify the class of preservance weights 
corresponding to the collection of binary vectors. Though this class has 
uncountably many elements, these elements are organized in finitely 
many ’directions’: the number of directions are related exponentially to 
the dimensionality of the collection of binary vectors. 

Preservance is shown to extend to enlarged discrete spaces derived 
as certain finite unions of scaled and translated versions of the collec- 
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tion of binary vectors, however, without an alteration in the class of 
preservance weights. Functions on such discrete spaces under pre.sor- 
vance are equivalent to sequences on the input space and as a con.se- 
quence linear separability is characterized in terms of the number of 
sign-transitions in the sequences and learning is shown to be equiva- 
lent to an enumeration of weights in the class of pre.servance weights 
and a search for threshold in a linearly ordered space. / 

The radix of numbering is shown to have little influence on preser- 
vance though the cardinality of the discrete space preserved increases 
with the radix and the preservance weights tend to bunch around the 
coordinate axes. Corresponding to every non-null weight vector in an 
Euclidean space a preservance input space, defined as a discrete' space' 
for which this weight would be a preservance weight, is identified and 
the preceding discussion is shown to extend to such an input space, 
though with appropriate rotations of the relevant coordinate frames. 

Representation in layered neural signal processors is the next issue 
considered in this thesis, however, the investigation is restricted to the 
specific case of feed-forward ensembles realizing maps on pre.servance 
input spaces as linear combinations of neural responses. A single layer 
processor restricted to have identical weights in all nodes is first con- 
sidered: the thresholds are, in contrast, allowed to be distinct. Such a 
structure is shown to represent all dichotomies on a preservance input 
space whose preservance weight is used as the common weight. 
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The number of nodes is no more than the number of distinct level- 
transitions in the sequences along the preservance weight that rep- 
resent functions over the preservance input space and the number of 
level-transitions is used to test minimality of an architecture. Learning 
in this processing structure is shown to involve a process of approximat- 
ing the collection of inputs described in a training set by a preservance 
input space, a search for a threshold in a linearly ordered space and an 
analytical solution for the coefficients of linear combination. A similar 
situation is shown to exist when the weights in the constituent neurons 
are distinct preservance weights of the same preservance input space. 

Multi-layered neural signal processors modeled to realize functions 
as linear combinations of neural network responses are shown to be 
functionally equivalent to single layer neural signal processors, how- 
ever, fewer nodes are needed to represent a given function when com- 
pared with that necessary in the corresponding minimal single layer 
processor. This is true only when the number of layers is smaller than 
the number of nodes needed in the minimal single-layer processor. As 
the preservance input space is discrete, an algebraic characterization 
of function realization with neural networks is considered to estab- 
lish that linearly separable dichotomies are exactly those partitions on 
(semi) lattices wherein each member, of the partition, is a semi-lattice. 

Issues of representation in neural signal processing architectures 
form the third topic of investigation in this thesis. Typed neural signal 
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processors are defined on continuous spaces, the typo number reflecting 
the degree of layering. Functions realized by ruuiral signal f)r()C(\ssoiH 
of all types are shown to be dense in the spacer of continuous functions: 
this is an extension of similar results established, on equivalents of 
type-l processors, by Cybenko (1989) and Hornik, Stinchcombe 
& White (1989), to cascades of type-1 |)roc(\ssors. Hirough a study 
of the functional nature of neural signal processors, four axioms are 
suggested to describe the current architectural commitments in neural 
signal processing activity: these axioms are sufficiently general to aid 
a unified study of neural signal processing architectures. 

1. Axiom of Organization A neural signal processor is composcMl of 
(layers oO three operational stages: measurtumuit, discriminaiitJu 
and aggregation in that order. Preprocessing, if any, (preccKiing, 
or incorporated in, the measurement) is sought to be represcinted 
in a neural basis. Measurements are effectc'd on an observation 
space constructed as the Cartesian product of the input space and 
a relevant subspace of a union of the space of responscjs of the 
distinct layers. 

2. Axiom of Measurement. A neural signal processor, through the 
measurement functions in each of the processing (decision mak- 
ing) nodes, induces a foliation, of codimension at least one, in 
the input manifold. This foliation forms the basis of synthesizing 
(approximating) the desired level curves of the function. 
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3. Axiom of Discrimination. A neural signal processor, through its dis- 
criminatory functions, renews the foliations, induced on the input 
space by the measurement functions, through a transformation, 
of the stems of the foliations, with at least one of the following 
properties. 

(a) alter the indexing of leaves to retain distinctness in a finite 
non-zero number of local regions of the input space, 

(b) introduce multiple components in the leaves, 

(c) associate, to at least one component of a loaf of the folia- 
tion due to discrimination, uncountably many leaves of the 
foliation due to measurement. 

Re-foliations provide the basis for establishing equivalences be- 
tween members (elements) of the input space in ways not possible 
through the chosen measurement functions. 

4. Axiom of Aggregation. A neural signal processor, through its aggre- 
gation function, synthesizes (or approximates) the level regions of 
processor response through a foliation on the Cartesian product 
of the stems of foliations on the input space due to discrimination. 
Concepts, in neural signal processors, are identified with the level 
regions of processor response. 

These axioms, coupled with the earlier stated algebraic characteri- 
zation of linear separability, suggest that the 'paradigm’ of neural com- 
puting, (specifically the notions of 'learning' and ’generalization’) is not 
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restricted to processors effecting maps between (vector) spaces of num- 
bers. As the notion of a foliation (Lawson, 1974) is one of inducing a 
partition on a space such that the members of the partition belong to an 
indexed collection, these axioms allow attention to be directed towards 
a unified treatment to neural computing, especially the analysis (and 
synthesis) of representation with neural networks. In particular, tlu'sc; 
axioms provide a framework wherein a formulation of problems related 
to decidability, solvability and completeness that dominate the theory 
of computing-these problems lead to queries about the capability of the 
neural computing paradigm to address issues related to the design of 
neural networks through the paradigm of neural computing-and a rela- 
tive evaluation of the formalism of Turing Machines with the paradigm 
of Neural Networks can be attempted. A unification, however, is not in 
the scope of this thesis. 

At an operational level neural signal processors effect (point-wise) 
nonlinear transformation between integral transforms: this interpre- 
tation allows representation in neural networks to be contrasted with 
other approaches to signal processor realization. The resulting con- 
stituents are used to suggest an interpretation to a function represen- 
tation theorem due to Kolmogorov (1957a): this interpretation is dif- 
ferent from that provided by Hecht-Nielsen (1987a), Kurkovd (1992) 
and Kovacec & Ribeiro (1993). Learning, under this interpretation, 
is equivalent to kernel design. The possibility of a solution to learning 
with a priori, but partial, knowledge of weights, a situation relevant in 
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hybrid networks, is indicated by incorporating neural network based 
function realization for the kernels of the integral transforms. 

Localization in the representation of neural signal processors is the 
final issue considered in this thesis. A localization in representation is 
shown to result from an influence of the kernels of integral transforms 
as well as from the mechanism of (point- wise) association between in- 
tegral transforms. Localization resulting from kernels is shown to re- 
strict the choice of weights in individual neurons to the linear span 
of window functions (sequences), however, there is no restriction on 
the constituent window functions (sequences). I also establish that the 
mechanism of association is restricted to have all derivatives (those 
that exist) in the linear span of window functions, effectively suggest- 
ing that in the connectionist approach to signal processor realization, 
signals and their processors are both capable of being described in com- 
parable, possibly same, ’basis’ space: this feature would be helpful in 
a formulation of neural network based systems which decide on the 
processing characteristics of neural networks. 

A characterization of localization in terms of wavelet transforms is 
considered to suggest the operational sense of ’basis’ function synthesis 
in neural network representations. This characterization is different 
from that provided by Zhang & Benveniste (1992) and Pati & Krish- 
naprasad (1993). Concepts represented in neural signal processors 
are shown to reflect evaluation of intra-pattern and inter-pattern fea- 
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tures, the former is influenced by localization due to measurement and 
aggregation kernels and the latter is a consequence of the mechanism 
of association between the integral transforms of measurement and ag- 
gregation. I also establish that localization in the intra-pattern and 
inter-pattern predicates restricts concepts represented by every node 
in a neural signal processor to a localized region, with one or more 
components, in the sheaf of input patterns. 

I have considered kernels of the reproducing type as a specific ex- 
ample of localization in the integral transforms of measurement and 
aggregation. These reproducing kernels have been shown to extend the 
notion of preservance-defined earlier on discrete input spaces-to input 
spaces that are continuous, however, with the limitation tliat not all 
reproducing kernels are representative of preservance weights, and, in 
the same way, not all kernels representing preservance weights exhibit 
the reproducing property. 

Based on the discussion of Nashed <6 Walter (1991) that every 
reproducing kernel is associated with a sampling theorem, I have es- 
tablished that the nature of representation in neural signal processor.s 
is in the sense of approximating concepts that are defined on continu- 
ous domains through finite number of (non-uniformly) spaced samples: 
the finiteness of the number of samples is assured when the concepts 
are of a localized nature and non-uniformity in sampling is admitted by 
the Paley-Wiener sampling theorem iop cit). This result implies that 
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conventional neural networks-ic, networks of finitely many neurons, 
each with finitely many inputs -represent concepts in a continuum if 
the kernels are of the reproducing type. 

An attempt at representing the (reproducing) kernels of the inte- 
gral transforms of measurement and aggregation via the paradigm of 
neural signal processing suggests that in the earlier stated notion of 
representation, the ’basis’ functions synthesized are related to mem- 
bers of a (wavelet) frame. The characteristics of the basic wavelets 
in the frame are decided by the degree of layering incorporated in the 
neural networks that synthesize the kernels of the measurement and 
aggregation integral transforms: larger the number of layers, greater 
is the degree of localization effected by the basic wavelets. 

Based on the characterization of representation in neural networks 
presented in this thesis, I have conjectured that the nature of repre- 
sentation in multi-layered networks is of the following kind: ’shallow’ 
networks are well suited for representing processors that have formal 
descriptions (le, a description involving rules of association) whereas 
’deep’ networks are necessary when the entities operating in a for- 
mal system needs to be identified/discovered. In other words, ’shallow’ 
networks are good in symbol processing while ’deep’ networks are nec- 
essary for ’symbol synthesis.’ Present neuro-anatomical evidence does 
not seem to refute this conjecture in that the cortex and neo-cortex, 
the seat of (conscious) symbolic activity, is organized to have few layers. 
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each with a wide spread of interconnections* In contrast, the mid-brain, 
whose functionality is not known in sufficient detail, but is believed to 
be responsible for (sub conscious) associations (part of which is the long 
term memory trace) are ’deep’ networks with localized connections. 


Organization of the Thesis 

The findings of my investigation together with a review of signal pro- 
cessing with neural networks is organized as a report consisting of 
seven chapters. An introduction to the idea of automated information 
processing, stressing on the connectionist approach to signal processing 
is presented in the first chapter. Some of the historical aspects of the 
connectionist approach to information processing are also incorporated. 
This chapter also dwells on the motivations for the present investiga- 
tion and, as a preface, presents an overview of the thesis accompanied 
by an outline of the thesis organization. 

A review of signal processing with neural networks is presented 
in Chapter 2. The notion of .signals, their processing and a.ssociated 
abstractions followed by prominent models describing the procea.sing 
in isolated neurons and neuronal ensembles are briefly introduced to 
provide the relevant background, terminology and notations. An out- 
line of the approaches available in the literature for the realization 
of signal processors through neural networks is also incorporated. In 
addition, the notions of intelligence and information processing arc cur- 
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tiorily reviewed in an appendix to supplement the contents of the first 
two chapters. 

The issue of representing signal processors in isolated neurons is 
taken up in Chapter 3. In this chapter, I introduce the notion of preser- 
vance and establish the existence of preservance weights. Preservance 
is initially established on the collection of binary vectors in a Euclidean 
space of dimensionality n and extended to discrete spaces constructed 
from the collection of binary vectors through scaling and translation. 
A characterization of the discrete input space accommodating preser- 
vance, the collection of weights that form preservance weights and func- 
tions represented on such spaces are incorporated. This chapter ends 
with a discussion on tlic extension of preservance to discrete spaces 
identified with numbering systems of a radix other than binary and 
a construction of preservance input spaces corresponding to arbitrary, 
but non-null, weights. 

Neural signal processor realization in layered ensembles of neurons 
is focused in Chapter 4. The influence of preservance on function real- 
ization in single layered neural signal processors is taken up first and 
this study is utilized in the study of function realization in multi-layered 
neural signal processors. An identification of preservance input spaces 
appropriate to the collection of inputs described in a training set and the 
attendant issues in the representation of input spaces is considered in 
this chapter. The algebraic characterization of representation in neural 
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signal processors on discrete input spaces forms the final component of 
this chapter. 

Characterization of neural signal processing architectures forms the 
theme of Chapter 5. In this chapter, I introduce neural signal proces- 
sors with t3^pes and consider the potential for representation in neural 
signal processors: the processors are considered operating on continu- 
ous input spaces. The functional characteristics of neural signal pro- 
cessors, axioms of neural signal processing and the suggestion for an 
operational paradigm of neural signal processing are considered in this 
chapter. A study of representation in neural signal processors in terms 
of function approximation is the final topic in this chapter. 

In Chapter 6, the issue of localization in the functions represented 
by neural signal processors on continuous input spaces is investigated. 
The nature of localization is first studied in the case of isolated neurons 
and then the study is carried over to feed-forward layered ensembles 
of neurons. Characterization of localization in terms familiar in the 
literature of signal processing and implications of localization on the 
nature of processing are considered in this chapter. The ’basis’ functions 
through which signal processors are realized in neural networks are 
related to wavelet transforms. In Chapter 7, 1 summarize the findings 
of my investigation and suggest directions of furtlier study. 
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[Marco Polo;] ’Sire, now I have told you about all the cities I 
know/ 

[Kublai Khan:] ’There is still one of which you never speak/ 
Marco Polo bowed his head. 

'Venice/ the Khan said. 

Marco smiled What else do you believe I have been talking to 
you about*?’ 

The emperor did not turn a hair. ’And yet I have never heard 
you mention that name.’ 

And Polo said: ’Every time I describe a city I am saying some- 
thing about Venice ’ 

Wlien I ask about other cities, I want to hear about them. And 
about Venice, when I ask you about Venice.’ 

’To distinguish the other cities’ qualities, I must speak of a first 
city that remains implicit. For me it is Venice.’ 

You should then begin each tale of your travels from the depar- 
ture, describing Venice as it is, all of it, not omitting anything 
you remember of it.’ 

’Memory’s images, once they are fixed in words, are erased,’ Polo 
said. ’Perhaps I am afraid of losing Venice all at once, if I speak 
of it. Or perhaps, speaking of other cities, I have already lost 
it, little by little.’ 


— Italo Calvino 
in Invisible Cities, 
Picador, I..ondon, 1979. 
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It is a degradation to a human being to chain him to an oar 
and use him as a source of power; but it is an almost equal 
degradation to assign him purely repetitive tasks in a factory 
[or institution), which demand less than a millionth of his 
bi'ain power. But it is simpler to organize a factory or galley 
which uses individual human beings for a trivial fraction of 
their worth than it is to provide a world in which they can grow 
to their full stature. 


— Norbert Wiener 

in The Human Use of Human Beings -Cybernetics and Society, 
Houghton Mifflin Company, Boston, 1960. 
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In our present sustained concern for ’progress’ and ’development’ the as- 
sociated incessant increase in demand for handling of ’materials’ could 
not have been met at the present levels of success without recourse to 
automation. ’Materials,’ in this discussion, need to be considered in a 
sense more abstract than our common interpretation of material being 
merely a morphological manifestation of matter.’ Avoiding detail, it is 
sufficient to note that the abstract notion of materials includes isolated 
or combined participation of manifestations of matter, energy and in- 
formation: this abstraction regarding materials and material handling 
is supported by the views of Diebold (1952)^ and Stonier (1990). 
This thesis is restricted to the ’information dimension’ of materials. 

Handicrafts, industrially fabricated products, energy in its various 
forms, human speech, music, images, computer programs and even 
mental states, viewed as states of the brain may be considered as ex- 
amples of the abstract material. (See Churchland, 1986; Church- 
land & Sejnowski, 1994, for a discussion on the materialist reduction 
of mental events to events in the brain.) Automation is a historical 
process involving two key aspects. 

(a) An identification of certain human endeavors which, though rou- 
tine, are considered essential for human survival and well being. 

^The necessity for considering the notion of material and its handling, at this abstract 
level arises in the context of the importance we have assigned, in our daily lives, to each 
of the concrete examples of the abstraction. 

^References are cited through the last name(s) of the authohs) (or editor(s)) and the 
year of publication of the manuscript. A list of references has been included at the end. 
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(b) An identification of processes/mechanisms by which routine hu- 
man activities can be mimicked anthropomorphically at least at 
the functional, if not at the phenomenological, level: these ma- 
chines (mechanisms) are expected to compete with human pres- 
ence in routine material handling and even guide the genesis 
and ontology of [machine specific] metaphors (models), ultimately 
challenging human existence to conform to these [new] metaphors. 

Material handling, by automatic means, necessitates three basic 
component mechanisms: sensors, effectors and control (or coordina- 
tion). Sensors are needed to detect (or measure) physical occurrences 
(location and extent) and the acceptability of processing to which the 
material has been subjected to. The desired steps of processing on the 
material (as simple as translations, in space and/or time) are imple- 
mented through, effectors. Control {coordination) ensures that material 
flow through all sections of the automated ’plant’ is unencumbered and 
the product quality is assured at every stage of processing. In the most 
popular presentations of automated material handling, the three basic 
components are likened to human organs, viz, sensory organs, motor 
organs (chiefly limbs and fingers) and the brain respectively. 

Information processing, especially decision making, is one of the cru- 
cial aspects of the control (coordination) mechanisms. The complexity of 
information handling, ie, acquisition of information from sensors, trans- 
mitting information between sensors and controllers as well as con- 


trollers and effectors, implementation and/or execution of commands 
by effectors and decisions to be taken by the controller, is increasing 
day by day with escalating demands on ’productivity’ and increasing 
sophistication in the technology of the material processing steps. Incor- 
poration of the abilities acquired in the automation ofmaterial handling 
to the processing of information is an immediate and natural, reaction 
to overcome the hurdle of information processing complexity. 

The automation of information processing generates the hope and 
desire, that intelligence, which for long has been regarded as a natu- 
ral privilege of human beings, is expressed through machines and that 
this automated intelligence (normally termed as artificial intelligence) 
will be available for the control and coordination of automated systems 
(including those involved in the automation of information processing). 
Several approaches have been suggested for the mechanized expression 
of intelligence, however, these can be broadly categorized under two 
distinct labels: top-down and bottom-up approaches, typical represen- 
tatives being symbolic artificial intelligence, or Classical AI and neural 
networks, or Connectionist AI respectively.^ A preliminary discussion 
on the nature of automated intelligence and a cursory review of the 
approaches to artificial intelligence have been included in Appendix A. 


^ These approaches have, ever since their inception, been dogged by controversie.s; 
some arising out of a (furious) debate between the various schools of thought regarding 
the nature of human intelligence and the approach to automated intelligence. (See 
Olazaran, 1993, for a discussion of the sociological history of the controversies in the 
approaches to artificial intelligence ) 
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In this chapter, as a preface to the thesis, I will outline the connec- 
tionist approach to artificial intelligence in order to highlight the (im- 
portant) operational concerns and technical issues that have animated 
the understanding of intelligence. As neural networks are increasingly 
finding acceptance, in the signal processing community, the motivations 
for a study of connectionist information processing systems, in partic- 
ular, issues related to representation of signal processors with neural 
networks, followed by statement of the problems addressed and a pre- 
view of the thesis have been included. An overview of the organization 
of the thesis has been provided at the end of this chapter to facilitate 
an easier movement through the contents 


1.1 Connectionist Artificial Intelligence 


Intelligence having been viewed as a consequence of information pro- 
cessing, one of the central issues of intelligence concerns information 
characterization and the nature of information representation.'* The 
central questions of automated intelligence, viewed in the connection- 
ist perspective enunciated by Rosenblatt (1958), are: 


*^Th6 view that intelligence is a consecjuence of information processing, while being 
dominant, is not, however, shared by all interested in the automation of intelligence. 
A criticism of information processing models in perceptual categorization and general- 
ization, the key aspects of an expression of intelligence, is offered by Edlelman (1987). 
7’his critici.am brings one of the important limitations of information processing models 
to intelligence, viz, the Homunculus problem. 
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How is information about the physical world sensed, or detected, 
by the biological system? 

ii. In what form is information stored, or remembered"^ 

iii. How does information contained in storage, or in memory, influ- 
ence recognition and behavior? 

Perceptrons and neural networks inspired by biological information 
processing, in particular, the architecture of the brain, are claimed to 
address these issues. The central theme of the connectionist paradigm, 
despite the varied interpretations, is that 

Whatever information is retained must somehow be stoied as a 
preference for a particular response] ie, the information is contained 
in connections or associations rather than topographic representa- 
tions. (The term response . . . should be understood to mean any dis- 
tinguishable state of the organism, which may or may not involve 
externally detectable muscular activity [ie, state in the language of 
dynamical systems].) 

This view, expressed by Rosenblatt, 1958, is supported by other inves- 
tigators. (See Appendix A.) 

Artificial neural networks, or neuromorphic systems, do not have a 
standardized definition, or terminology.^ However, the following def- 

®This is partly because of the differing interests in the community of investigators 
charmed into a study of neural networks. 
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inition by Sage & Withers (1990) based on the definition suggested 
in the DARPA Neural Network study captures the consensus in the 
existing definitions.® 


A system composed of many simple processors, fully or sparsely 
connected, whose function is determined by the connection topology 
and stienglhs 

This system is capable of a high level function such as adaptation or 
learning with or without supervision as well as lower level functions 
such as vision and speech preprocessing. 

The function of the simple processors and the structure of the con- 
nections are inspired by the study of biological nervous systems. 


Hecht-Nielsen (1990) suggests a more technically elaborate definition 
for neural networks and Fiesler (1994) has proposed a standardization 
in the terminology of neural networks 

Before discussing the operational history of neural networks, it will 
not be out of place to have a brief digression to understand the scope of 


moment’s indulgence in the luxury of abstraction would reveal that the common 
refrain in all of the existing definitions is that neural networks are function fields over 
pcirticiUy ordered index spaces', as of yet, however, the collection of functions are indexed 
over lattice-points In an abstraction of this form, neural networks share the same 
universe as (Universal) Turing Machines, Finite State Machines, Grammars, Normal 
Algorithms, etc. With this abstraction, it is important to seek out the interplay between 
inter-function interactions and the macroscopic functional specificities (or properties), 
especially to understand the nature of cognition, ie, automated intelligence, that can be 
accounted for by models sharing the above abstraction. Neural networks formulated as 
function fields over partially ordered lattices will provide a framework well suited for a 
study of the representational characteristics of universal neural networks. 




interpretation available with the term Artificial Neural Networks. On 
the basis of an analysis of meaning, we can argue that four (subtly) dis- 
tinct, yet interacting, activities are valid [operational] candidates under 
the common banner of artificial neural networks and each of these is 
important in the automation of intelligence. These altering interpreta- 
tions occur due to minor variations from the DAKPA definition. 

The first of these interpretations stems from our common under- 
standing of the adjectives artificial and neural, wherein the discussion 
is of networks of [processing] elements each of which is a mimicry of iso- 
lated real world neurons, the specific characteristics of the processing 
elements and of the interconnections between the processing elements 
retaining empirically established properties. In this activity, the fo- 
cus is one of establishing models for isolated real world neurons and 
to meticulously study the various kinds of interconnections exhibited 
in biologically expressed neuronal ensembles -typically, brain -and to 
relate the observed structures to biological functions. Such a study is 
to be expected, commonly, in established departments of neurobiology 
(incorporating neuroanatomy and neurophysiology). 

An activity encompassing a study of methods by which to cultivate 
networks of neural-like processing elements, possibly as replacements 
of existing brains, aided by the interpretation of artificial neurons in 
the sense of S3nithesized neurons is the second valid interpretation of 
the term artificial neural networks: it is not difficult to notice that such 
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an activity can sustain only on the knowledge generated by activities 
falling in the first category. This activity, essentially being a study 
of biologically compatible devices aiding (human) intervention in the 
otherwise (mal)functioning systems, is to be anticipated under heads 
like bio-engineering, bio-electronics, etc. Here, it is important to note 
that the stress is on getting the synthetic neurons to act as reasonable 
substitutes (or surrogates) to real life neurons. 

Interest in bio-engineering promotes an activity wherein cultures of 
neural ensembles are interfaced (electronically) to engineering systems 
and motivates inquiries into the computational advantages offered, 
over conventional electronics, by biological information processing sub- 
strates. This activity, foreseeable under heads like neuro-technology, 
bio-informatics, etc, and a third valid interpretation of artificial neural 
networks, relies on the knowledge provided by neuro-science to cater to 
specific processing requirements. In such an activity, the focus, typ- 
ically, is on realizing the desired information processing requirement 
given the characteristics of the (biological) processing substrate and in 
a sense, is n»t very different from activities in conventional electronics. 

While the first category of activity underlines the possibility of 
neuro-science and the second and third categories anticipate neuro- 
[bio] -technology , we cannot surely miss out on another valid interpreta- 
tion allowing for an activity supporting a study of possible alternatives 
to the characteristics of neurons (processing elements) and inter-neural 


interconnections. This interpretation, driven by issues oi practical re- 
alizability, seeks out an exploration of possible structures and thereby 
attempts an understanding of the dimension of automated intelligence. 

Indeed, this approach, which may be termed as neuro-engineering, 
is the one activity that has been the fancy of many an investigator in 
the present time: a fancy, not necessarily in the sense of an irrational 
choice in the presence of other viable information processing possibil- 
ities, however, in view of the fact that several information processing 
situations are being handled by methods involving neural networks, 
ie, neural networks are being viewed as a panacea for all information 
processing situations. This thesis too will be confined in its attention to 
the neuro-engineering aspect of operational interpretation. It should 
however, be noted that neuro-science and neuro-philosophy (discussed 
mainly in Churchland, 1986) are not unimportant for an understand- 
ing of neuro-engineering. 

Research activity in neural networks has, since its beginning in the 
work of McCulloch & Pitts (1943), taken on all the above interpreta- 
tions, in particular those provided by the first and last categories. Per- 
ceptrons were the first non-trivial neural networks to be investigated. 
The focus in neural networks has always been one of information rep- 
resentation and one of the important manifestations of this focus, in 
addition to that of architectural types, is in the ’automatic’ selection or 
search of (tunable) connection strengths between processing nodes. Bio- 
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logical inspirations have guided the labeling of this activity as learnings 
or training. An equally important consequence of the representational 
focus is in generalization, le, the problem of extending the scope of the 
knowledge base, represented through learning, to input situations not 
contained in the training repertoire: this capability is projected as one 
of the strong points of the sub-symbolic paradigm over the symbolic 
representational framework 

Initial investigations in perceptrons were limited to a single level of 
adaptive weights and the discovery of a training algorithm triggered in- 
tense investigations. However, this was to be short-lived as automated 
training of multi-layered networks turned out to be elusive. In this con- 
text, Minsky & Papert (1969) established some serious limitations: 


No diameter-limited perceptron [re, a perceptron wherein each con- 
stituent processing node evaluates a local predicate] can determine 
whether or not all the parts of any geometrical figure [incident on 
the retina] are connected to one another! . . 

Part of the attraction of the perceptron lies in the possibility 
of using very simple physical devices - "analogue computers" -to 
evaluate the linear threshold functions. It is perhaps generally 
appreciated that the utility of this scheme is limited by the sparse- 
ness of linear threshold functions in the set of all logical functions. 
However, almost no attention has been paid to the possibility that 
the set of linear functions which are practically realizable may be 
rarer still . . . 



The perceptron has shown itself worthy of study despite (and 
even because of!) its severe limitations. It has many features to 
attract attention: its linearity; its intriguing learning theorem; its 
clear paradigmatic simplicity as a kind of parallel computation. 
There is no reason to suppose that any of these virtues carry over 
to the many-layered version. Nevertheless, we consider it to be 
an important research problem to elucidate (or reject) our intuitive 
judgment that the extension is sterile. Perhaps some powerful 
convergence theorem will be discovered, or some profound reason 
for the failure to produce an interesting "learning theorem" for the 
multi layered machine will be found.^ 


In addition, Minsky and Papert established that simple predicates like 
that of parity (and connectedness) were not represented by perceptrons 
and pointed out the futility of the usage of perceptrons in view of the 
limited set of representable predicates. 


These observations motivated several investigators to opt for infor- 
mation processing approaches different from perceptrons and quite a 
large number got interested in the symbolic processing paradigm. Quite 
independent of investigations in perceptrons, Widro w & Hoff ( 1960) *- 
see also Widro w & Winter (1988) -had investigated the suitability 


^ It is noteworthy to remark that in the same year that Minsky and Papert came 
out with their book, Bryson & Ho (1969) suggested, in the field of optimal control, 
an algorithm for automatic specification of parameters in multi-stage controllers. This 
algorithm shares many features with other (stochastic) gradient search algorithms like 
Delta rule and Error Back-Propagation. 
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of perceptron like ADALINES (adaptive linear elements) and MADALINES 
(many ADALINES) in signal processing applications. In these investiga- 
tions, a variant of stochastic gradient search algorithm, named Delta 
rule (also Widrow-Hoff learning rule), was used for the automatic spec- 
ification of interconnection strengths. 

Interest in neural network activity dramatically increased with the 
discovery,** by Hinton, Rumelhart and McClelland (Rumelhart, Hin- 
ton & Williams, 1986; McClelland, Rumelhart, et al, 1986a) of a 
learning rule, based on an error back-propagation, for multi-layered 
perceptrons' this rule, has been shown to follow the same princi- 
ples as Delta rule (Matheus & Hohensee, 1987). Hopfield’s dis- 
covery of the applicability of neural networks to associative memories 
(Iloplield, 1982) triggered interest in neural networks among physi- 
cists (leading to a reduction of neural networks to mean-field theory, 
Amari (1983), Ising spin systems, van Hemmen (1986), van Hem- 
men, Grensing, et al (1988a, 1988b) and attractor dynamics, Amit 
(1989)) and the signal processing community. 

By then, Kohonen (1977, 1980, 1984), Fukushima (1969, 1970, 
1980), Fukushima & Miyake (1982), Grossbcrg (1980, 1982) and 
Carpenter & Grossberg (1986a) had made substantial contributions 
to the usage of neural networks in visual signal processing: however, 
these got to be widely recognized only after the publication of inves- 

** The controversy related to the discovery of learning in multi-layer neural networks 
is traced by Olazaran (1993) 
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tigations by Rumelhart and Hopfield. As neural networks regained 
acceptance as a viable computational paradigm, focused inquiries into 
their capabilities have been pursued. Valiant (1984) has addrc'ssed 
the question of learning, though as a non-empirical enquiry. The i.s.sue 
oflearning complexity has been investigated by Judd (1990). Learn- 
ing with generalization has been the focus of investigations by Valiant 
(1984), Baum & Haussler (1989) and has been related to the Vapnik- 
Chervonenkis dimension (Hertz, Krogh & Palmer, 1991) in addition. 

Connectionism, in its new form, attracted attention not only at the 
theoretical and procedural issues of processor representation, but also 
at the level of hardware realization. Mead, the pioneer of VLSI, pro- 
posed schemes for the realization of ’analog VLSI’ (Mead, 1989; Mead 
& Ismail, 1989) and designed electronic retina and cochlea (Lyon 
& Mead, 1990), to emulate the human visual and auditory sensations. 
In view of the fact that connectionism seeks for knowledge represen- 
tation in inter-processor interconnection strengths and the individual 
processors are very small (owing to simplicity) and VLSI technology 
too mandates similar requirements, VLSI implementation of proces- 
sors with a neural basis leads to efficient wafer utilization. Neural 
networks, specially of the dynamical kind, have been employed in the 
routing of (conventional/symbolic) VLSI processors ( Jayadeva, 1993). 

In view of practical difficulties faced in the training of neural net- 
works, typically the long sessions of training, enormity of processing 
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nodes for realistic problems and the general inability to easily correlate 
rules and features to individual nodes, hybrid approaches, taking on 
a mix of concepts from the symbolic and connectionist traditions have 
been investigated in the literature. This approach, strongly inclined 
towards a Cartesian dualism, views neural networks as a pre-symbolic 
computational substrate providing the necessary interface between the 
’external’ world and the symbolic computational units. 


1.2 An Overview of the Thesis 

Neurons, in the present investigations of connectionist information 
processing, are operational generalizations of the threshold and deci- 
sion units of Rosenblatt’s perceptrons and representation of knowledge 
available (or given) through examples of association, continues to be 
sought through a storage of information in the connection strengths 
between processing units; the information stored is unrelated to the 
individual patterns incident on the ’retina’. However, the information 
stored in the connection strengths, also known as weights, is related 
to the functional dependencies specified -the specification is, in gen- 
eral, not exhaustive - through examples of valid association. Though 
this aspect has been appreciated and used, to advantage, in current 
investigations, no study seems to explicate the nature of preservation. 

Recognition of patterns and function approximation (signal/state 
estimation) being the core of information processing, especially in au- 
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tomated systems, the limitations of processor realization in single neu- 
rons has necessitated processing schemes involving an interconnected 
ensemble of neurons. Processing schemes involving layers of neutrons 
are typical, the interconnections being between neurons in differtmt, 
generally adjacent, layers and, in selective cases, between neurons of 
the same layer. While significant claims have been made, in the litera- 
ture, regarding the adequacy of layered neural networks, with a single 
or multiple layers of decision making, in the recognition of patterns and 
function approximation, a relative evaluation of the representational 
effort needed in these networks does not seem to have been considered. 

Present research on neural networks, inspired by the requirennmts 
of automating intelligence, has resulted in a plethora of networks, each 
concentrating on capturing specific a.spects of input-output ndatiou- 
ships. Discussions of these networks are also accompanied by elaborate' 
procedures for the automated specification of processor parameters Ue, 
learning, or automatic programming) However, in this collective en- 
quiry, as much theoretical as empirical, the axioms governing the study, 
are not generally stated. While it would be incorrect to state that con- 
nectionism is merely a collection of ab initio axiomatic statements and 
the ensuing logical discourse, it would be inappropriate not to seek the 
logical basis underlying the structure of current thinking. 

The notion and nature of representation in neural networks have 
not been explicitly stated, though, a common theme discernible in the 
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literature is that neural networks accommodate a localized representa- 
tion of the input patterns at the output. This facet of neural networks 
has been, time and again, referred to as the ability of providing a plausi- 
ble account of concepts induced on feature space Connectionism, in the 
currently available literature, is generally regarded as being restricted 
to realize functions between numerical spaces, though there seems to 
be no obvious reason why such a restraint should be operating. Indeed, 
the spirit of connectionism seems to exclude realization of functionals, 
operators, relations and mappings between more general spaces: one 
of the far reaching implications of this restriction is to miss the op- 
portunity to automate (in a neural basis) the decisions (information 
processing) related to the design and operation of neural networks.^ 

In this thesis, I consider a characterization of the representation 
of signal processors with neural networks, under the topic of connec- 
tionist signal processing systems: however, I do not lay claim towards 
exhaustiveness in the study reported herewith. Representation is in- 
terpreted, in this thesis, as a decomposition and/or synthesis of the 
desired function through 'basis' functions that are not chosen a priori^ 
but are synthesized to suit the requirements of processor realization: 
the requirements are specified through finitely many 'examples' of the 


connectionism is extended to spaces more general than numerical, statements re- 
garding neural networks, essentially mappings between function, functional, or operator 
spaces, could possibly be captured in neural networks. This ability would pave the way 
for a definition of a universal neural network, through which a study of the limitations 
of connectionism in cognitive modeling and proper comparison (possibly a unification) 
between nmnectionist and classical AI could be thought of 
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desired association. The ’basis’ functions are, however, not restricted 
to be dependent on the family of functions under consideration. 

Preservation of information in the processor realization offered by 
individual neurons, representational features of layered networks of 
neurons, paradigmatic concerns of representing signal processors with 
neural networks and characteristics of representation, in signal pro- 
cessing terms, offered by interconnected ensembles of neuron layers 
are the four broad topics that have been investigated. The principal 
claims of the study are listed below. 

(a) Every weight associated with a neuron, in the sense of channeling 
information from different sites in a network (including elements 
of the incident input pattern) to the decision component of the neu- 
ron, preserves mappings on certain discrete pattern collections 
as sequences, this preservation reduces learning to an enumera- 
tion of weights - the enumeration is not without structure - and a 
search for threshold in a linearly ordered space. 

(b) Multi layered neural information processors, independent of the 
degree of layering, are adequate for representing functions on 
preservance input spaces, the demand on the number of process- 
ing nodes decreasing, in general, with the number of nodes: this 
adequacy translates into an assurance of the possibility of ex- 
tending connectionist information processing to symbol spaces 
wherein linear separability, a basic characteristic required of the 
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processing nodes, is stated algebraically as the partitioning in- 
duced by a dichotomy on a (semi) lattice such that each member 
of the partition is a semi-lattice. 

(c) Processor representation is achieved in neural networks through 
point-wise nonlinear associations between integral transforms; 
the kernels, generally nonlinear, of the integral transforms when 
S5mthesi2ed through neural networks allow an incorporation of a 
priori knowledge of the processing architecture and functionality 

(d) Function representation in neural signal processors is accom- 
panied by localization and concepts, identified as processor re- 
sponses, reflect a restriction of evaluation, expressed as a weighted 
average of representations in wavelet frames, to localized regions 
in the sheaf of input patterns. 

On learning the motivations for the investigations reported herein, 
it is now imperative to know the salient queries that need to be ad- 
dressed in order that the study be satisfactory. Representation of pro- 
cessors in neurons being of a primary nature, answers to the following 
queries would enable an understanding of function representation in 
ensembles of interconnected neurons. 

i. Does there exist assignments to the connection strengths in a neu- 
ral network that ensure a preservation, of relative order between 
inputs, in the representation of functions? 
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ii. If connection strengths of the kind indicated above do exist, wiiat 
is the extent of the input pattern space that is subject to the 
criterion of preservation? 

iii. How does the existence of connection strengtlis that preserve rel- 
ative order between inputs affect the issues in function represen- 
tation, especially learning and generalization? 

Preservation of relative order between inputs has been considered 
as the basis of the above enquiry noting that partitioning effected, on 
the input space, by decision elements, based on an approach of com- 
parison (of discriminants with a threshold), is decided to a large extent 
by the relative ordering enforced on the inputs in the process of evalu- 
ating the discriminants. Representational schemes that can be shown 
to be generalizations of positional numbering have been shown to pro- 
vide preservation in the sense mentioned above and I have established 
the existence, with an identification, of certain discrete spaces, corre- 
sponding to each non-null configuration of connection strengths, that 
accommodate a preservation in function representation. A reduction of 
learning to the distinct steps of weight (connection strengths) enumer- 
ation and a search for threshold in a linearly ordered space has been 
shown. The interplay of the issues of generalization and learning in the 
selection of weights and threshold are also indicated, though cursorily. 

Assurance of the existence of assignments to connection strengths 
that allow a preservation, of relative order between inputs, in function 
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representation leads to an interest in the manner in which the feature 
of preservation can be used to understand the nature of function rep- 
resentation in layered neural information processing systems. In this 
connection, the following enquiry would be helpful. 

i. How does preservation, of relative order between inputs, affect 
function representation in layered neural information processors? 

ii. How is the representation of functions in layered networks of 
neurons affected by the degree of layering? 

iii. Given that preservation, of relative order between inputs, is appli- 

cable on certain discrete spaces does there exist an interpretation 
of the processing functionality that would allow an extension of 
neural computation to symbol spaces?^® 


I have established that while the scope of function representation, 
discussed in the restricted context of functions on the discrete spaces 

^®Due to the much acclaimed success of neural networks and the projected differences 
between symbolic and sub-symbolic (neural) approaches to information processing, it is 
important to know whether, or not, the paradigm of neural networks is restricted solely 
to approximation of functions on continuous (numerical) spaces, ic, is it ever possible 
to extend [naturally] neural processing to functions defined on abstract spaces, typically 
symbol spaces? In order that the insight we try to get of the representational paradigm 
in artificial neural networks is non-trivial, an essential requirement is that the notion 
of representation should be applicable independent of the nature of neurons used in 
processor realization. This notion, in addition to providing a basis for acquiring a unified 
understanding of neural signal processing schemes, could be useful in exploring the 
possibility of using the neural paradigm in the decision making and search problems 
related to neural networks* this, if successful, would project the neural computational 
paradigm as an alternative to formal automata, a typical example being Turing Machines. 
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in which preservation (of relative order between inputs) holds, is re- 
ally unaffected by the degree of layering, the nature of representation 
varies in the sense that an increased degree of layering leads, in gen- 
eral, to a realization of the given function with fewer processing nodes 
than necessary in the case of processors involving a single layer of neu- 
ral decision elements. An alternative definition to linear separability, 
the basic processor characteristic in most neural lU'tworks, has bc'on 
established: I reproduce the definition. “ 

A dichotomy on a (semi) lattice is said to be linearly separable if the 
[embedding] lattice can be expressed as a partition, each component 
of the partition being a semi-lattice. 

Concerning the representation of signal processors with neural net- 
works we are faced with the following essential questions. 

i. What plausible axioms are necessary for a discourse in neural 
signal processing? 

ii. What operational interpretation would allow for a unification of 
existing neural architectures and suggest novel architectures? 

iii. What is the nature (and characteristic) of represemtation in neural 

signal processing? 

^^This definition allows us to appreciate that the spirit of conncctionism need not be 
restricted to mappings between numerical spaces. 




Section 1 2 Overview of the Thesis 


23 


These questions refer to the nature of information storage and han- 
dling in neural networks. I have shown that four axioms are essential 
to neural signal processors: these are reproduced below. (The axioms 
described below are related only to the operational character of neural 
signal processing and do not stipulate either the components or the 
context in which such processing is realized.) 

1. Axiom of Organization. 

A neural signal processor is composed of (layers of) three opera- 
tional stages: measurement, discrimination and aggregation in 
that order. Preprocessing, if any, (preceding, or incorporated in, 
the measurement) is sought to be represented in a neural basis. 
Measurements are effected on an observation space constructed as 
the Cartesian product of the input space and a relevant subspace 
of a union of the space of responses of the distinct layers. 

2. Axiom of Measurement. 

A neural signal processor, through the measurement functions in 
each of the processing (decision making) nodes, induces a foliation, 
of codimension at least one, in the input manifold. This foliation 
forms the basis of synthesizing (approximating) the desired level 
curves of the function. 

3. Axiom of Discrimination. 

A neural signal processor, through its discriminatory functions, 
renews the foliations, induced on the input space by the mea- 
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surement functions, through a transformation, of tlic atoms of the 
foliations, with at least one of the following properties: 

(a) alter the indexing of leaves to retain distinctnc'ss in a finite 
non-zero number of local regions of the input spac(», 

(b) introduce multiple components in the leav<'.s, 

(c) associate, to at least one component of a leaf of thc> folia- 
tion due to discrimination, uncountably many leaves of the 
foliation due to measurement. 

Re-foliations provide the basis for establishing equivalences be- 
tween members (elements) of the input space in ways not iKrssible 
through the chosen measurement functions. 

4. Axiom of Aggregation. 

A neural signal processor, through its aggregation function, syn- 
thesizes (or approximates) the level regions of processor response 
through a foliation on the Cartesian product of the stems of fo- 
liations on the input space due to discrimination. Concepts, in 
neural signal processors, are identified with the level regions of 
processor response. 

From the signal processing perspective, neural signal processors, 
viewed as integral transforms interacting via point- wise nonlinear trans- 
formations provide the key to unify the several architectural typos. 
'^The motivation for seektaf a unified understanding can be seen in the following 
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Indeed, the interpretation of neural signal processing as nonlinear 
transformations between integral transforms relates nicely to the view 
that decisions are taken on feature spaces and that feature extraction 
(also termed pre-processing) can be sought to be realized in a neural 
basis, in a manner similar to the feature processing. Neural signal 
processors with sigmoidal activation functions effecting nonlinear as- 


perceptive remarks. (All these statements are found m Machlup & Mansfield (1983a), 
p. 7-8 ) 


Several analogies have been used to characterize isolationist or parochial 
attitudes of specialists uninterested in cognate or complementary fields of 
inquiry. For example, they erect fences around their fields -like unsociable 
property owners inhospitable to their neighbors. 

[Flields of scientific work . . . which have been explored from the different 
sides of pure mathematics, statistics, electrical engineering, and neuro- 
physiology, in which every single notion receives a separate name from 
each group and in which important work has been triplicated or quadrupli- 
cated, while still other important work is delayed by the unavailability in 
one field of results that may have already become classical in the next field. 
[A case in point is the discovery of algorithms for learning in multi-layer 
perceptrons 1 

It is these boundary regions of science which offer the richest opportunities 
to the qualified investigator. (Wiener, 1948, p. 2 ) 

[Slcience is split into innumerable disciplines continually generating new 
subdisciplines. In consequence, the physicist, the biologist, the psycholo- 
gist and the social scientist are, so to speak, encapsulated in their private 
universes, and it is difficult to get one word from one cocoon to the other 
(von Bertalanffy, 1968, p. 30.) 

The Republic of Learning is breaking up into isolated subcultures with only 
tenuous lines of communication between them ... an assemblage of walled- 
in hermits, each mumbling to himself words in a private language that only 
he can understand. (Boulding, 1956, p. 198.) 

However, in his plea for interdisciplinary collaboration, Boulding warned 
that ^’it is all too easy for the interdisciplinary to degenerate into the undis- 
ciplined,'* (Ibid, p. 13.) 
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sociations on linear discriminants have been shown to be adequate to 
represent all the architectural novelties suggested by the axioms of neu- 
ral signal processing. This investigation, however, provides an insight 
into the nature of representation in a superpositions of functions, each 
related to the other through a permutation of weights. Such -siiijorposi- 
tions, in neural signal processors with sigmoidal activation functions, 
have been shown to realize functions necessitating activation functions 
that are non-sigmoidal. I have also indicated that the kernels, gener- 
ally nonlinear, used in neural signal processors when realized through 
neural signal processors involving multiple layers relate to issues? in- 
volving the incorporation of a priori, but partial, knowledge ahont tlie 
interconnection strengths between processors. 

The local nature of representation characteristic of neural signal 
processors, discussed earlier, motivates the following enquiry. Note that 
this investigation proceeds in the framework of neural signal processors 
being point-wise transformations between integral transforms. 

i. How do the kernels of integral transforms and the mechanism 
of nonlinear association influence the nature of localization in 
function representation? 

ii. What, in terms of processing of signals, are the characteristics of 
localization in the neural approach to function realization? 

hi. What is the implication of localization on the nature of information 

processing realized through neural signal processors? 
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Localization in neural networks induced, by considerations of real- 
ization in the connection strengths (aind thresholds), due to the kernels 
of integral transforms have been shown to be related to the predicates 
evaluating relative organization of assignments within a pattern: this 
aspect of localization has been qualified through the term intra-pattern 
predicates. In contrast, localization due to the mechanism of nonlin- 
ear association relates to predicates evaluating relative organization of 
assignments between patterns, this aspect has been qualified through 
the term inter-pattern predicates. In addition, concepts represented by 
nodes of neural (signal) processors have been shown to be localized re- 
gions in the sheaf of patterns, each concept being the consequence of a 
conjoint evaluation of intra-pattem and inter-pattern predicates. This 
latter statement suggests that representation in neural signal process- 
ing is neither localized in the sense of individual nodes being identified 
with distinct concepts nor has an involvement of the entirety of nodes 
in a network participating in the synthesis of a concept. 

Kernels of the reproducing type, as a choice for the integral trans- 
forms of measurement and aggregation, has been considered in the 
localization of representation in neural signal processors. Such a local- 
ization shows that the nature of representation is in the sense of the 
measurements effecting a reconstruction of the incident (local) concept 
through finitely many (non-uniformly spaced) samples. The synthesis 
of ’basis’ functions, in neural signal processors, that support a repre- 
sentation of the desired processor has been shown to be in the sense of 
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the measurements effecting a representation of the incident concepts 
in the basis functions that are used to realize the kernels of the integral 
transforms of measurement and the responses (aggregates) effecting a 
representation of the decisions (discriminations) on the measurements 
in the basis functions that realize the kernels of the integral trans- 
forms of aggregation. Such a representation has also been related to 
the notion of representation in wavelet frames. 


1.3 Organization of the Thesis 

Chapter 2 is devoted to a review of signal processing with neural net- 
works, the central theme of this thesis. The review is divided into four 
components, each corresponding to a separate section. Signal process- 
ing, at a reasonably abstract level of formulation and some of the estab- 
lished approaches to signal processing are considered in § 2.1 {p. 35). 
This section also attempts to bring out the importance of signal and 
system representation and also trace some of the key requirements of 
signal processors. A review of artificial neural networks is taken up in 
§ 2.2 (p. 54). An abstract formal model of single neurons and specific 
(existing) models of interest captured by the abstraction present the 
background in which a study of the need for networks of neurons and 
the architectures of neural networks are studied. 

The abstractions of signal processing and neural networks, are uti- 
lized in § 2.3 (p. 82) dwelling on a review of neural signal processing. 
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An effort is made to provide a glimpse of the history of neural signal 
processing. This is not irrelevant especially in a context wherein the 
attempts to process signals with neural networks have been initiated 
soon after the invention of perceptrons but have attracted attention 
only since the mid-1980’s-a period which also saw furious debates on 
the nature and relevance, of artificial intelligence, with serious attacks 
on classical and to a lesser extent, connectionist approaches to AI. 

A study of processor representation in isolated neurons has been 
presented in Chapter 3. The existence of assignments to connection 
strengths that allow a preservation, of relative order between inputs 
and a characterization of the input subspaces that accommodate this 
preservation has been established in § 3. 1 (p. 110). Function representa- 
tion with preservation and an appraisal of linearly separable functions, 
the only dichotomies represented by neurons with binary comparators, 
has been considered in § 3.2 (p. 138). Learning, in a context where 
preservation is supported, is reinterpreted in § 3.3 (p. 154) and the 
interplay between learning and generalization is indicated. In § 3.4 
(p. 171) the notion of preservation is extended to numbering systems 
with radices different from binary and an identification of preservance 
input spaces - input spaces preserved by any arbitrary, but non-null, 
assignment to connection strengths -has been sought. 

I introduce, in Chapter 4, the notion of neural signal processing and 
investigate the influence of preservation, of relative order, on function 
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realization. § 4.1 (p 188) is a discu.ssion of function rt'pro.sontation 
on single layered neural signal processors. Multi-layorod variotios are 
investigated in § 4.2 (p. 201) from the perspective of function rcjpresoii- 
tation in the context of preservation. In § 4.3 (p. 208) I di.scus.s the issue 
of identifying preservance input spaces appropriate to a given training 
set, an equivalent of learning the weights of the first layer. § 4 A {;>. 218) 
introduces an algebraic equivalent of the notion of linear scrparability 
and concludes with a cursory look into the po.ssibility of repre.senling 
functions between symbol spaces. 

Representational issues in neural signal processing architectures 
form the theme of Chapter 5. Neural signal processors, with types, 
are defined in § 5.1 (p. 237). The potential for representation in neural 
signal processors is also investigated in this section. I establish, in § 5.2 
(p. 249), the functional characteristics of neural signal proce.ssors and 
state the axioms of organization, measurement, discrimination and 
aggregation. Neural signal processing has been interpreted in § 5.3 
(p. 276) as involving point-wi.se nonlinear a.ssociations between inU'gral 
transforms: these transforms relate to measurement and aggregation 
operations. In § 5.4 (p. 284) representation in neural signal processors 
has been considered from the perspective of function approximation.* ' 

Localization in the functions represented by neural signal processors 
has been investigated in Chapter 6. In § 6.1 (p. 316)1 have discussed the 

i^Note that the essential nature of intelligence consequent on information pnicoBsinp 
reduces to approximation of functions describing the relevant ciass/rngion momhershiim. 
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influences of kernels on localization, this discussion has been developed 
in the context of isolated neurons. An extension of the kernel influence 
on localization in functions represented in layered neural signal pro- 
cessors accompanied by an investigation of the localization influenced 
by the mechanism of nonlinear association forms the discussion of § 6.2 
ip. 322). A characterization of localization in terms of signal processing 
and the implications of localization on the nature of processing effected 
by the neural approach to information processing have been considered 
in § 6.3 (p. 332). The influence of kernel structure on the representation 
in neural signal processors has been studied in § 6.4 ip. 344). 

In Chapter 7, the final chapter, I sum up the conclusions of the 
investigations in the preceding chapters The relevance of some of the 
key results and interesting directions of further study have also been 
incorporated in this chapter. In addition, two appendices have been 
included. Appendix A provides a glimpse of the prominent approaches 
suggested for an automation of intelligence. The notations used in this 
thesis have been listed in Appendix B. A list of references cited in the 
thesis has btien included after the appendices. 
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So what comes of our making, 

Is slices of history pieced together, 

In full knowledge that 

All history is distortion of reality, 

And appropriate fillers added to reinforce 

The image of what we want it all to be in retrospect, 

At best a Romance of sorts. 

And it's a Romance that keeps us going, 

Sometimes asunder. 


— Weepy Sinner (Prof V P Sinha) 
in History and Romance: A Joem or a Poke, 
Indian Institute of Technology Kanpur 
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Recently neural networks, a class of hierarchical nonlinear dynamical 
systems, are the focus of attention in connection with the realization of 
nonlinear signal processors. In several instances of signal processing 
operations, eg, recognition of handwritten characters, facial recogni- 
tion, texture identification, sonar signal classification, speech signal 
processing, etc, parameterization or grammatical formulation of the 
signal space, approaches common to the conventional approaches of 
signal processing, is not amenable due to insufficient understanding 
of the processes underlying the signal generation and/or the volume of 
(empirically observed) data being grossly insufficient to allow (reliable) 
closed-form input-output relationships from being established. At a lay 
level of interaction, such operations are believed to involve a subjective 
element in the processor. 

These operations, however, have finitely many examples of the input 
signal, le, prototypes, for which the corresponding outputs are known 
either completely or partially. It is of interest to realize the correspond- 
ing processor in such a manner as to extract invariances, related to the 
processor, from the finite number of examples and incorporate the ex- 
tracted invariances while estimating the processor output correspond- 
ing to input signals not included in the repertoire of prototypes. Neu- 
ral networks, essentially computational models inspired from current 
accounts of (human) cognitive abilities and grounded in the accumu- 
lated understanding of the biological substrate supporting information 
processing, present a framework supporting both requirements: the 
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process of extracting invariances is addressed, as a representational 
issue, through problems of learning and incorporation of (extracted) 
invariances in processing is formulated as a problem of generalization. 

In this chapter, I review, cursorily, the status of current research 
related to signal processing with neural networks, to indicate the rele- 
vant background, terminology and notations: the approach has been to 
focus more on the issues important in neural signal processing, than on 
enumerating specific accomplishments of processing with neural net- 
works. Abstractions in understanding signals, their processing and 
classification of processors are focused in § 2.1, In § 2.2 {p. 54), artifi- 
cial neural networks are reviewed preparatory to an understanding of 
neural signal processing. The history of neural network based signal 
processing and current understanding in architectures, algorithms and 
usage of neural networks are considered in § 2.3 ip. 82). 


2.1 Signal Processing: Crucial Issues and 
necessities 

Processing of information is an essential requirement of automation, in 
particular, the mechanized expression of intelligence and signals pro- 
vide a vehicle (or medium) for expressing the desired representations 
of entities, events and objects in the physical world (also termed re- 
ality). Such a representation is necessitated by the requirements of 
information manipulation, symbolic or otherwise. As signals and the 
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information they convey acquire an ontological status and get to be rec- 
ognized as valid members of the physical world, thereby necessitating 
information, in addition to matter and energy, as an essential dimen- 
sion of material manifestation, signals could, indeed, be representing 
properties, traits, or qualities of materials, including signals.' 

Signals are commonly expressed as functions (processes) describing 
the entity or object under consideration as a dependence on a narrow, 
localized, region of the space-time continuum we are accustomed to 
term as the Universe-, it is common, though not essential, to use nu- 
merical assignments, or assignments involving vectors of numbers, to 
the domain and range of the signals. The functions, rather than be- 
ing arbitrary, are expected to conform to the physical and/or biological 
constraints, if any, involved in the process of sensation 

In this view, we have visual sensations described as a matrix of 
numbers, auditory sensations described as a sequence of numbers, etc, 
as valid examples for signals. Commonly real-numbers and to a certain 
extent complex numbers, are used in encoding the domain and range 
values of signals. With advances in digital processing technology, recent 
efforts aimed at symbolic encodings for domain as well as range values 
and associated algebraic structure of processors attempt to exploit the 
symbolic processing methodology. 


^ While it is expected of signals to be used as a meta-language de.scribing objects in the 
physical world, it is not incorrect to include signals in the object language, particularly 
in the context of the abstract notion of material introduced in the previous chapter. 
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The infonnation content of a signal is sought in the relative organi- 
zation of assignments over the domain and in this sense, signals are the 
means of information interchange between communicating processes. 
In classical AI, the relative organization is expressed as a predicate 
of an appropriate mode of logic, while in statistics and connectionist 
AI, descriptions of relative organization are to be found in the signal 
statistics (distributions): incidentally, the two are not unrelated and in 
this thesis I will use the term predicate to mean such a relative orga- 
nization. One of the key requirements in the processing of information 
(pattern recognition) is to identify the predicates relevant to the task at 
hand,^ and to detect the predicate(s) applicable to the incident signal 

In the j)rocosHing of signals, it is common to find tliat signals re- 
ceived from processes separated in space and/or time are not identical 
to the original or intended one, thereby the relative organization of 
assignments in the received signal could be different from that in the 
original signal. Under these circumstances, signal processors, in order 
to satisfy the information processing requirement, are equipped with a 
distance measure, with which predicates corresponding to the incident 
signal are evaluated with the repertoire of relevant predicates (possibly 
updated in the progression of time) and the signal is classified on the 


classical AI the identification of relevant predicates (hypotheses) is formulated 
as the problem of (knowledge) representation, while in neural networks, the same is 
accomplished in the learning phase/mode. 
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basis of this evaluation, generally in the sense of least deviation, which 
is based on the topological notion of continuity.'^ 


Signals and their Processing 


A signal processor, purport- 
ing to or extract, a 

signal y from a (given) sig- 
nal X, essentially scans x 
over a subset of its domain 
and in each scan, based, in 
general, on a fixed and/or fi- 
nite number of arguments, 
derived, from the signals x 
and y, in accordance with 
the scan position, evaluates 
the assignment to the sig- 
nal y, at the corresponding 
scan position, as a function 
of the arguments (derived 



Figure 2.1: Essential structure 
of a signal processor 


^Signal processing is also desired in the sense of mapping the incident signal to one 
which has its assignments relatively better organized. This problem, essentially similar 
to that of signal classification, addresses the issue of extracting information (in the sense 
of predicates) from a given signal and processors employed for this purpose art termed 
filters. Estimation of signals too is based on similar principles and hence, in the ensuing 
discussion, I will use the terms filters, processors and estimators interchangeably so as 
not to lose generality. 
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from :r and y). Symbolically, the preceding statement can be expressed 
concisely as in the following. An illustration too accompanies in Fig- 
ure 2.1 for clarif 3 dng the statement. 



X. E —* X, y. © —^ y, 

vde© 3K(e) c s, A/yie) c e 

(2.1a) 

such that 

J\fx (0) U Afy (B) 0, 

ar(B) = € 2lj-, ny(0) = € ^1;/. 


and 

y(B) = f {<p{ax{9) ,6) ,i){ay{6) ,6) ,0) , 

(2.1b) 

where, 

(ji: a* X 0 -V Sx, V’: X 0 ®y> 


and 

/• ®x X X 0 3^. 



In the above expressions, I use the following notations. 

S := Domain of definition of the signal x Scanning of the signal over 

this domain is indicated by the progression of ^ 

0 := Domain of definition of the signal y Scanning of the signal over 

this domain is indicated by the progression of ^ 

X := Range space of the signal x. 
y :s= Range space of the signal y. 

Six := Algebraic structure (of appropriate kind) on the space of all 
signals from E to A'. (Six contains subsets of A'“ ) 

2ty := Algebraic structure (of appropriate kind) on 3^^, the space of all 

signals from 0 to y, (2ly contains subsets of 3^^.) 

!8x := Space of measurei^ents on signal ,r. (Possibly the same as ^tx.) 
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5B. := 
A4W := 

^^y{0) := 


a.{e) := 


ay {6) := 


'0 

/ 


Space of measurements on signal y. (Possibly the same as 94 .) 
Neighbourhood structure in E at the scan position 0^ (0) C E 

for all 0 E O. 

Neighbourhood structure in O at the scan position 0, £ f) 

for all ^ € 0. 

Assignments of x over AfxiO). (Note that a^iO) is a signal in 
identifies an ordered subset (ordered by (^)) of 
such that \/6 e O ax {6) £ Sl^..) 

Assignments of y overAfy (0). (Note that Oy (S) is a signal in 
and also identifies an ordered subset (ordered by ( 6^ ) ) of such 

thatV(9 € e ay (6) £ % ) 

Indexed collection of measures'* on 24, indexed by $ £ 0. 

Indexed collection of measures on 2ty, indexed by $ € 0- 
Mechanism (method) by which the evaluation of aasignmemts to 
y are arrived at This includes correlations between the signals x 


and y. 


The above formal statement*'^ is sufficiently general to allow each 
of 5, 0, A" and 3^, to be either discrete or continuous, numerical or 
symbolic and scalar or vector collections (generally vector spaces) and 
captures the essential traits of nearly all of the distinct signal processor 

'^More precisely, <!> and ip are product measures on the algebraic strucUirei and 
respectively These functions measure an appropriate (desired) aspect of the relative 
organization of assignments in the signals x and y, m, (p and V’ are predicates compatible 
to signals x and y. 

®In the present form, this statement dictates the processing model to be of the data 
driven kind, ie, processing is initiated only on the relevant portion, viz ax(^) and ay{0), 
of the signals x and y being available. (See Gorsline, 1986, for a discussion on the 
data driven computational model.) The functional relationships easily suggest corre* 
spondences, if not an isomorphism, with the formalism of Turing Machines, common in 
the theory of computation. * 
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kinds Signals are abstracted as functions (or processes) and the role 
of a signal processor is to establish associations between such abstract 
entities as depicted in the following. 

Af y 

fx ^ 2 /r (2.2) 

~ 0 

Signal processors, as described above, belong to the class of abstract dy- 
namical systems when ^ 0, and the influence on y of the assignments 
ay{8), for all 6 E 0, through the map V' is not null. 

Contextual information contained in signals, x and y is formally 
captured through the algebraic structures 21* and 2ly respectively, and 
the functions <p and 0 indicate the nature of evaluation (measurement) 
of information content in signal regions specified by the neighbourhoods 
A4 (8) and Afy(8) respectively. It is not uncommon to find the algebraic 
structures 2ti, 2(y and the space of measurements ©x, being of 
the nature of (algebraic) fields. Note that this formalism encourages 
a recursive (at times, circular) understanding to signal processors, ie, 
each of the components of the above formalism, specifically 0, and 
/, are valid candidates for being considered as signal (information) 
processors. It is also important to note that specification (and the 
structure) of neighbourhoods Afx{9), Afy (0) and functions (/>, ip and /, are 
crucial to signal processor design, specification and classification. These 
issues translate to those of signal and processor representation noting 
the recursiveness involved in the understanding of signal processors. 
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Requirements of Signal Processors 


Signals are functions aimed at a representation of perceptual entities 
in the ’reality’ around us. The role of signal processors is to enhance, or 
protect from possible deterioration, the perceptual qualification of sig- 
nals obtained from data gathered through sensors, or measurement ap- 
paratus, specially in situations involving signal translocation through 
space and/or time,® possibly through media introducing distortions. 

In this role, processing is viewed as a means for guiding percep- 
tual categorization and, in the context of automation (including that 
of intelligence), is expected to reduce the cognitive burden of human 
participants. This latter requirement, t3q)ical in situations involving 
the control/coordination of complex systems, immediately tran.slates to 
an endeavor seeking for a normalized performance of the processor.s, 
the criterion of normalization being the satisficability’’' of the outcome 
of processing to ’average’ participants in the efforts of automation. 


^Translocation of signals through space (and, inevitably, in time) forms the crux of 
communication (see Cherry, 1957, for the essential nature of the problem of commu- 
nication, and a glimpse of its theoretical structure) and is commonly studied as signal 
transmission. Signal transportation in time, without appreciable variation in spatial 
location, is commonly experienced in situations involving storage and retrieval. In both 
of the extreme forms, physical considerations disallow distortion free signal handling. 

’^Satisficability refers to the exigency of compelling the processor to provide the desired 
and/or idealized performance within tolerable limits. While this requirement could be 
mathematically formulated as a minimization, within the limits of tolerance, of the 
deviation of processor performance from that desired, or idealized, an inevitable element 
of subjectivity is encountered in deciding the limit of tolerance. The term satlsficability 
has been chosen to signify, what has come to be known as, the subjective element involved 
in approximation. I acknowledge Prof VP Sinha for introducing me to this term. 
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A requirement additionally encountered in the design of processors 
is to relate the processing steps to prevailing conceptual categories, typ- 
ically feature extraction (incorporating dimension reduction), feature 
discrimination and concept aggregation, of information processing by 
(average) humans: such a signal processing mechanism is then touted 
as a model for mental activity (intelligence). In passing, it is important 
to note that the average (or stereot3q)ic) human participant in informa- 
tion processing is only a postulate and no candidate need be expected 
to satisfy the criterion of averageness. 

The criterion of satisficability to 'average^ individuals is commonly 
expressed mathematically in terms of function approximation. Per- 
ceptual entities, as mentioned earlier, are conveyed (represented) in 
signals through the relative organization of assignments over relevant 
domains. In terms of the formal statement presented in the preceding 
article, the localized assignments a(*) relate, in terms of structural con- 
straints (as measured by (j) and i) on signals x and y, respectively), to 
perceptual entities. 

Function realization being the essential nature of signal processors, 
the context in which the realization is being sought influences the re- 
quirements of processing. While function realization translates to a re- 
quirement of exact reconstruction in signal processors between symbol 
spaces, the requirement in signal processors defined between contin- 
uous spaces of numbers is always one of obtaining an approximation: 



44 


Chapter 2 Signal Processing with Neural Networks 


both variants are relevant criteria of satisficabilty. Such a difference 
in processing requirements stems from the fact that while in continu- 
ous spaces of numbers a natural notion of proximity, or mugliborhood, 
is appreciated regardless of the specific details of the space, no such 
universal notion can be associated with symbol spaces. 

The (un)satisficability criterion governing the choice of assignments 
to the output signal given an input signal {le, the design of the pro- 
cessing rule) is mathematically expressed in terms of a measure of mis- 
match-generalized distance -between the output signal and an ideal- 
ized (or expected or desired) form of processing on the input signal: the 
measure of mismatch is designed, or chosen, to incorporate the specific 
aspects of the perceptual entities that need to be preserved in the sig- 
nal processing operation. Referring to the notations in the formal state- 
ment in the previous article, the symbolic form of the (un)satisificability 
criterion is 


mm ||s(i/(^),ay(l9),(^)||,, (2.3) 

where, s- 3^ x x 6> -+ 6. 

In the above expression, s refers to the measure of mismatch, te, 
(un)satisficability, 6 denotes the repertoire of distinct labels (possibly 
numbers) used to distinguish the possible mismatches and ||s(-,-, •)|| 
denotes the accumulation of (unlsatisficability over all the individual 
assignments (to signal y). The latter operation serves to express, 
through a single number (generally integer or real), a measure of 



Section 2 1 Signal Processing Crucial issues and necessities 


45 


(un)satisficability for the entirety of the outcome of the processing op- 
eration: the desire to have this measure expressed as a single number 
stems from the need to order, given the natural ordering available in 
one-dimensional spaces, the possible outcomes of a processor for a given 
input signal and to select the best alternative, in the sense of minimum 
mismatch. It is not uncommon to find the operator 11*11 satisfying the 
axioms of a norm and this explains the associated notation. 

The generic forms of the (un)satisficability criterion designed to mea- 
sure the mismatch of the output signal with an a prion specified desired 
signal, or an idealized (or expected) form of processing on the given in- 
put signal are, respectively, indicated in the following. 

= p{iAe) ,ym) > ( 2 . 4 ) 

Ay{e).ay{e),e) = p{y{e).g{ay{e),e)), (2 5 ) 

where, ya denotes the desired form of signal on processing, g is an ap- 
propriate function (possibly incorporating '^) specifying the idealized 
(or expected) form of processing needed and p indicates the mechanism 
by which comparison between the output signal and the desired or ide- 
alized signal forms is achieved. In functional analytic terms, p is, gener- 
ally, a (semi)metric, le, a metric (distance function) with the axiom of un- 
signedness relaxed. Processors with the first form of (un)satisficability 
criterion {ie, matching with desired signals) are termed supervised and 
those with the other form are termed semi-supervised} 

^Processors without explicit supervision, are, in general, termed unsupervised, though 
this term is not altogether appropriate 
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Automatic minimization of the measure of mismatch (s), formulated, 
equivalently, as a problem of search in the space of admissible solutions, 
is commonly addressed, specially in signal processing, as a variant 
of (stochastic) gradient descent and, consequently, s is expect(>d to be 
defined between continuous spaces exhibiting dillerentiability (alinost) 
everywhere. The (un)satisficability criterion is commonly formulated 
to have a quadratic variation in the numerical values of the desired and 
realized signal assignments (essentially minimization is on the Z,“-or 

-norm® of error) and, from physicalist, considerations this criterion 
is given an interpretation of energy. 

Though reasonably simple in concept, gradient descent baaed ap- 
proaches have come in for sharp criticism due to an inherent (and 
inevitable) lack of speedy convergence and the undesirable feature of 
search seeking out ’locally optimal solutions’ with a likelihood no less 
than that of seeking ’globally optimal solutions’: locally optimal solu- 
tions are understood in the sense of the measure of the region around 
candidate solutions (ie, those meeting the satisficability criterion) nd- 
ative to that of the space of admissible solutions. Convergence, equiv- 
alent to termination, of the search procedure is, in general, in a weak 
sense and can be assured only when the mismatch is evaluated as a 
quadratic function of the desired and realized signals. 

®Note that s is a characterization of approximation error and, honce, tim error is 
expected to belong to a continuous space. However, the signal could bo delined on a space 
whicli is either discrete or continuous: in the former case the terra sequence is more 
frequently used. Minimization of o is thus with respect to the norm when the signal 
definition domain is continuous, and norm when discrete. 
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Several schemes, accompanied by claims of superiority over gradient 
descent, have been suggested to conduct (automated) search, the latest 
being genetic algorithms (Goldberg, 1989), distinguished, from other 
evolutionary approaches, in its applicability to abstract search prob- 
lems, obviating the requirement of defining s over continuous spaces, 
or even requiring differentiability of the same. Relatively more con- 
vincing assurance of seeking globally optimal solutions and speedier 
convergence, despite the absence of a firm theoretical basis for such 
claims, characterize the approach of genetic algorithms Search being 
an essential component of training in (artificial) neural networks, tra- 
ditional formulations of learning as variants of gradient descent in the 
space of (admissible) weights are currently being reinvestigated in the 
framework of evolutionary programming, typically genetic algorithms. 


Signal Processor Types 

Processors wherein the operations <p, 0 / are all linear and the 

neighbourhoods 04(0) and Afy{0), for all 6 £ 6, are imposed through 
delay, or shift, operations are termed linear: violation of any of these 
stipulations implies that the processor is nonlinear. If the measure- 
ment functions cp or 0 evaluate the signal assignments (0) and Oj, (0), 
respectively, depending on the position 0 then a processor realized with 
such measurement functions is termed adaptive. Processors wherein 
the signal evaluation mechanism / is independent of 0 are termed shift- 
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invariant, or translation-invariant and it is pointless to seek for such 
an invariance when adaptivity is incorporated. 

If no element of random variation is incorporated in the construction 
of neighbourhoods (7\4 {&), and functions {tp, ip, f) th('n processors 

incorporating such components are termed deterministic, else stochas- 
tic. A new element of description is introduced in stochastic processors, 
that of distributions, le, a characterization of invariances in the like- 
lihood of relative organization of assignments in a signal, possibly in 
relation with other desired, or a priori chosen, signals. 

The formalism in Equation 2 1 (p. 39) is capable of representing 
stochastic processors when the algebraic structures 21 j. and 21j, are Borel 
(sigma) fields and when the functions (f>, V' and /, which are to be de- 
signed to capture the relevant (desired) distributions, satisfy the axioms 
of probability measures. In such processors, the members of 21 j. and 21,, 
are commonly termed events. Shift invariance is generally considered 
in terms of stationarity, however, this invariance is qualified by the 
particular distributions of interest in view of the fact that processing is 
characterized by distributions. 

Signal processors for which S = 0 are termed filters: in this class 
the input and output signals (.r and y respectively) are described on 
the same domain. Processors functioning as filters are termed causal^° 

*°Causality is, incidentally, natural only when the signal definition domain is one- 
dimensional and has the interpretation of time. 
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if the common signal domain O (= E) is a partially ordered set (with 
respect to an appropriate (precedence) relation denoted by :;<) and for 
all i9,,()2 € (-), -< $ 2 , implies •:< A4(02) and A/y(()i) J^yiO-x): 

implicit is the assumption that the precedence relation ■<, defined on 0 
is carried over to the collection of neighbourhoods for signals ;r and y. 

In most filtering operations, elements of 0 (= S) are given conno- 
tations of time and/or spatial co-ordinates and consequently, the prece- 
dence relation :< has the natural interpretation of historicity. Antici- 
patory systems {cf, Rosen, 1985) are those filters whose neighborhood 
structures defy the stricture of causality: in such systems it is common 
to interpret the processing as being influenced by signal assignments 
of an unvisited future. It is interesting to note that signals are identifi- 
able with filters (processors), thereby reducing the distinction between 
signals and (signal processing) systems. Filters with elements of ran- 
domness, are termed stochastic processes, when the signal definition 
domain is one-dimensional and random fields otherwise {ie, higher- 
dimensional signal definition domain). 

Transforms, in signal processing, are defined to be situations of pro- 
cessing established between non-identical domains of signal definition 
(ie, E ^ ©) and it is common to force My {$) = 0 for all Be©- Clas- 
sical transforms are characterized by Mr (B) = E for all Be©, while 
the approach in window transforms (including wavelet transforms) is 
to seek out a neighbourhood structure of the form 'i 61,62 e ©, Bi B 2 , 
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implies (^i ) ^ A4 (^ 2 ) 0- Usually, S' is associated with time and/or 

spatial-extent and 6 with frequency or repeatability of (specific) rela- 
tive organization of assignments and the signal y is, generally, termed 
the spectrum of the signal x. A reversal of interpretations is used in 
inverse transforms. 

In the representation of signals and linear (shift invariant) proces- 
sors, transforms play an important role. The transform (or spectral) 
approach simplifies an operation of convolution to point-wise multipli- 
cation (with non-window transforms like Fourier transform, Laplace 
transform, or their variants): this feature allows the processor func- 
tionality to be completely characterized in terms of its impulse response. 
Finiteness of the impulse response is used to characterize (linear) pro- 
cessors, specifically filters. 

'Windowed transforms, essentially (conventional)tranaforms applied 
to local signal sections, introduce a spectral dependence on shifts (of 
windows, described through the measurement function 4>) in the signal 
definition domain. This dependence motivates window transforms to 
be viewed as members of the general class of Spatial-Spectral proces- 
sors, of which time-frequency processors {cf Cohen, 1989) and proces- 
sors involving cojoint spatial/spatial-frequency (energy) distributions 
{cf, Wechsler, 1990) are important special cases. Joint characteriza- 
tion of processors, on the signal definition and spectral domains have 
been immensely useful in processing signals with non-stationarity in 
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statistics and also when signals are to be subjected, as in Wavelet Trans- 
forms, to non-uniform processing, possibly without shift-invariance 

Spatial-Spectral processors are represented in the formalism of Equa- 
tion 2.1 (p. 39) with the following rclincmont of symbols. Identify with 
0, two domains 0 and E, such that 0 = E x 0, where, E is derived 
from £■ and 0 is given spectral connotations like frequency, or scale 
(Wavelet Transforms) The variable $ is identified with the ordered 
2-tuple (I, ff) with ^ e E and (9 6 0. Note that the neighbourhoods (Af) 
and assignments (a) related to signals x and p are now indexed by | and 
0. Rephrase the measurement functions <p and V' as 

(ff) , 0) = il §) , 1 0) = {1 6 ) , a, (1 0 ) , 1 0) , 

V-(ct„ (0 ) , 0) = i’iay 0 ) , <f, 0) = il 0 ) , a, (1 0 ) , 0) , 

with appropriate choice of component functions ij>, tOj. and itty . In the 
above symbolization, lOa, (^, 6 ) and lUy (^, 0 ) denote the window functions 
operating on x and y respectively. In window transforms, it is common 
to find E = E, and members of E signify the amount of translations tlic 
(weighting) windows are subjected to. 

Gabor Transforms and Wavelet Transforms are the most commonly 
used window transforms in signal processing. Noting that Afy{^,0) = 
Afy {0) = 0 for all (1, 0) 6 ,§ X 0, in the case of (window) transforms, the 
processor is specified by the simpler abstraction 

CENTR»L uSRAR) 

♦ » T * 
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wherein recursive (self-referential) dependence of i/ has been suppressed. 
Gabor transforms are characterized by 

(ailmo = c-«XT(Ov^es, 

2s/Tr<; 

v[l§)eEx[(-)], 

where, = —1 and ? > 0 is a constant uninfluenced by choices in 
translational (|) or spectral (0) parameters. (Note that to{(,0) = 1 
allows Fourier transforms to be computed.) 

In contrast, Wavelet Transforms have the first two of the above 
expressions as 


(aiUmo = 

v^€.r, 

u 

where b denotes a basic wavelet^^ window. The basic wavelet window 
is commonly obtained from a scaling function ip as the solution of the 


the literature, basic wavelets are denoted as -0 and scaling functions from which 
the wavelets are derived, by scaling and shifting, are denoted as 0. As these symbols 
have already been used to denote abstract signal processing steps, I have opted to recode 
the notations to preserve conceptual clarity in the symbols. 
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difference equations^ 

*foo 

}— ™o(,t 

f" oc* 

i = -.oo 

the sequences p and q being called two scale sequences. 

As localized evaluation is the key idea of windowing, transforms 
based on wavelets (subjected to scaling and translation) are consid- 
ered superior when the extent of localization is to adapt in accordance 
with signal variation over the domain of definition. (In Gabor trans- 
forms, localization of signal evaluation is invariant in the sense that 
no scale parameter is incorporated.) Window transforms, by virtue of 
localization in signal assessment, have been used in time-frequency 
(spatial/spatial-frequency) analysis wherein the focus is to trace (de- 
tect) variations in signal characteristics, typically expressed through 
terms having spectral connotation, as a function of signal evolution in 
time (space). An important point to note is that in window transforms, 
localization normally is effective in the signal definition as well as spec- 
tral domains and is unbiased towards the' dimension of these spaces, ie, 
localization is operative on all the basis vectors describing the signal. 


'^Similar to two ocale rolationshipa, general n scale relationships (»i = 2, 3, . . ), while 
conceivable, have not yet become popular and hence the general difference equations are 
not indicated. 
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2.2 Artificial Neural Networks: A preparatory 
review 


Research in artificial neural networks, in a span of a litth; over a half 
century, has had the issues related to the modcdinf? of (human) per- 
ceptual activities, specifically the information processing sought to be 
carried out by the brain, as a continuing area of major interest: one of 
the motivations for such a focus is to eventually enable an automated 
expression of intelligence. In this focus, the key problems addressed 
have been the identification (and isolation) of plausible structures ca- 
pable of information processing, typically categorization, generalization 
and estimation, especially when the patterns presented (for processing) 
are noisy and/or incomplete; information retention with associated is- 
sues of retrieval and recall, in particular models accounting for short 
and long term memories; and automated mcchanism(s) of incorporating 
available knowledge into suggested structures of information jjrocess- 
ing and/or storage: this latter problem is known as learning. 

The underlying paradigm in artificial neural networks is to real- 
ize all of the above (perceptual) capabilities through an interconnected 
ensemble of basic processing units, these units are themselves not ex- 
pected to exhibit any of the desired functional traits: such an opera- 
tional characteristic encourages the view that neural networks (of the 
natural as well as artificial kinds) allow for an ’emergence’ of computa- 
tional functionality through the framework of interconnections - the no- 
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tion of generalization is not unrelated to that of emergence. Information 
retention, retrieval and recall has been sought through interconnected 
hierarchical dynamical systems, whereas information processing has 
been accounted for in a variety of architectures, each attempting to 
recreate a distinct facet of biological information processing. 

An unbroken tradition of research in artificial neural networks has 
been the parametrically selectable processing character of the con- 
stituent units: the response of a node to information incident on the 
input channels He, dendritic arborescence in real world neurons) is in- 
fluenced by the specific weightages (parameters) associated with the 
channels and the parameters are tuned (selected) in accordance with 
the specific (desired) knowledge base to be represented, or learnt.*^ In 
all of the presentations of neural network research, (information bear- 
ing) patterns of activity presented to the processing nodes, weightages 
associated with information incident on input channels and the re- 
sponse of the basic information processing units are encoded using real 
numbers (at times integers), this encoding is used for a conjoint repre- 
sentation of coordinates and objects in the perceived reality around us. 
Each processing node is then, a multivariate (real) scalar valued (real) 
function and it is common to expect, from considerations of categoriza- 
tion, that this function is of a nonlinear nature. 

criticism by (Edelman, 1987) of information processing models accounting for 
intelligence rests on the essential problems of the a priori status to information and agen- 
tal (homuneular) status to information association central to parametric formulations of 
information processing. 
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Learning, or the automatic association witli processing nodes, of 
connection strengths capable of representing the desired knowledge 
base, has presented the most challenging of problems in neural network 
investigations. Nearly all formulations of the learning problem seek 
to search an optimal choice of parameters in an appropriate space, 
the criterion of optimality is generally chosen from considerations of 
enabling the search to be approached as a variant of gradient-descent 

The nonlinear nature of the functional relationship between inputs 
and output of the processing node compounds the search problem by im- 
posing a non-unimodality in the (appropriately signed) objective func- 
tion: a consequence of this lack of unimodality is that solutions to the 
search problem are generally locally optimal, instead of the desired 
global optimum. Retrieval and recall in dynamic, or recurrent, neu- 
ral networks being similar to the problem of learning, local optimal 
solutions spell disaster due to incomplete pattern reconstruction, ie, 
inaccurate recall with no assurance of repetitive aUemi)ts of similar 
recall resulting in similar responses. Issues of generalization, when 
included along with representation (ie, learning), further complicates 
the situation, rendering the learning problem intractable. 

Success, even though limited, in modeling (human) cognitive abili- 
ties, have triggered a fresh wave of interest in artificial neural networks. 
In this section I will review the nature of research in artificial neural 
networks, principally from the point of view of neuro-science, to facili- 
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tate a review of neural signal processing, one of the key activities under 
the heading of neuro-engineering. As the focus, in artificial neural net- 
works, has really been on models accounting for perception, significant 
attention has not been given, in the present endeavors, to the issue of 
data/knowledge validity, as is to be expected in any modeling endeavor, 
typically using statistical approaches.*'* 


Neurons and their Models 


Isolated real world neurons,**’ used as the functional basis of artifi- 
cial neural networks, are formally modeled by expressions having the 
general form 

= -a(7fe0) . (2.10a) 

y{xj) = crivixj) ,t); (2.10b) 

the earlier of these equations is equivalent to the statement 
HzJ) - (b{v{T,t)) - S‘x) , ' = 


^"^Thus, detection of outliers in the training data, robustness of the neural procedures in 
information processing and robustness of learning algorithms have not yet been consid- 
ered sufficiently important, though, as the paradigm of neural netwoiks is increasingly 
attracting the attention of statisticians, these issues will ttmd to dominate future dis- 
courses on neural networks. Algorithmic issues, principally that of complexity, currently 
addressed sporadically, are bound to highlight, in subsequent times, the characteristics 
of computation with artificial neural networks. 

^'Un this discussion, I will dwell only on the mathematical models suggested for (and 
inspired from) biological neurons. Anatomical and physiological basis of models for 
neurons and their interconnected ensembles can be found in the biologically grounded 
discussions provided by Churchland (1986), McClelland, Rumelhart, et al (1986a), 
Peretto (1992) and ChuircMand & Sajnowski (1994), 
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and the latter is sufficiently 
general to allow incorpora- 
tion of refractory times (cf, 

Peretto, 1992) in the neu- 
ral response. Neurons with 
the above (internal) dynam- 
ics are said to be of the Co- 
hen & Grossberg (1983) 
type. (See, Kosko, 1992a ) 

This is a deterministic model for the dynamics within neurons He, ele- 
mentary processors of a neural network). Figure 2.2 depicts the above 
model in graphical symbols and this symbolism will be used in the 
sequel to denote isolated neurons. 

Additive and shunting models are the most important special cases 
of the above dynamical model of a neuron. When a is a constant and 
the function b is rectilinear^® in its argument the neuron is said to have 
additive dynamics (Kosko, 1992a). The model of additive dynamics, 
also termed in the literature as brain-state-in-a-box (Anderson, 1983) 
and conductance model, is typically (an equivalent) of the following 
form. 



Figure 2.2- Symbol for an iso- 
lated neuron 
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function f: dt is rectilinear in its argument, say if / has the form f(x) 

mx -f c, for some m, c € This, incidentally, is not the same as linearity when c 0. 
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= rV{x,t)+7]{x,t) = (2.11a) 

1=1 

y(x,t) = a{i]{x,t)), (2.11b) 

where, a{rj{x,t)) = b{r]{x,t)) = R^^r}{x,t) - I, and 5 ^ = i?7^ C, 7?, 

i?, and I being constants; the refractory time in neural response has 
been ignored and will not be considered in the rest of the discussion. 

If, in Equation 2.10 (p. 57), the function a is rectilinear in its argu- 
ment and b is nonlinear, shunting or multiplicative activation dynamics 
results in the neuron which represents a special case of the Hodgkin- 
Huxley membrane equation (c/, Hodgkin & Huxley, 1952; Gross- 
berg, 1982; 1988; Cohen & Grossberg, 1987; Kosko, 1992a). This 
model exhibits saturation (with increasing pattern inionsiiy) if the 
function 6 is a constant (le, a trivial nonlinear function). 

In this discussion, the following notations hold.^^ 

77 := potential accumulated on the membrane of a neuron (ic, neuron 
state, also termed as post synaptic potential), 77 6 3 ^, is the 
real number field. 

Xx := activity (input) on (dendritic) channel i, z = 1, 2 , , . . n, n being 
the number of channels, and x* G 5 R. 
t := (independent) variable denoting the progression of time, t G 
[0, 00]. 

^"^Note that 0 has been used again in a sense different from that in the previous section. 
In this section and the rest of the thesis, 0 will be used to mean threshold and/or bias 
associated with a neuron: the distinction will be clear from the context. 
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a := an abstract amplification function indicating the mechanism of 
modulation (decay) of the membrane potential r/, a : ^ with 

the restnction that a takes non-negative values for reasons of 
stability (Kosko, 1992 a). 

6 := an abstract translation function specifying the extent of state 
translation in the dynamics of the membrane potential 77, b: 

:= interconnection strength, or synaptic efficacy, of channel i, z =: 

1 , 2 ,. n,s^Gn 

y := action (response) of the neuron, physiologically associated with 
the frequency of axonal spike generation, y € [C- , C+] C 
for neurons with continuous valued outputs with appropriate 
values for and C+> OJ" 2/ ^ {Co, Ci ? • • • Cc}, with a priori values 
Cj € 9 ^, i = 0, 1,.. c, c = 1,2,. being (one less than) the 
number of categories, for neurons with discrete valued outputs. 
a := activation function mapping the membrane potential r; to the 
response (axonal spike frequency) y, generally using a non- 
linear method (possibly with a provision for refractory time), 
cr: 9 ^ -4 [C-,C4-] for continuous valued neurons and <r: S 
{Co, Ci> • • • Cc} for discrete valued neurons. 

C := membrane capacitance (constant amplification in conductance 
model), C > 0. 

R := membrane (leakage) resistance (linear translation in conduc- 
tance model), Q < R< 00. 

•= conductance of channel i (connection strength in conductance 
model), 0 < < 00, i = 1, 2, , , . n. 

I := current applied externally (static translation in conductance 
model), J € 3 ^. 
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r = RC := membrane charge-discharge time constant, 0 < r < oo. 

Wr = •= weightingvalue associated with channeU, tuj € 5R, t == 1, 2, ii 
9 :zz RJ = threshold of firing, 6 e 

Note that the abstract translation function b serves to provide a gener- 
alization of thresholding. 

The activation function a is nonlinear, in general, to allow for the 
decision, as a function of the membrane potential, to be non-trivial.^® A 
general requirement, from considerations of categorization and decision- 
making, is that the activation function be capable of inducing a discrim- 
ination on the membrane potential i]. Common forms for the activation 
function are 

r C+ 

hardlimiter := o-h{0 (2.12a) 

[ C- otherwise, 

sigmoid = = (C+ - C-) + C-, (2.12b) 

2 ~ 

= S if[C-,C+] = [o,i], 

, tanhiO if [C-,Ch-] = [-1, 1]> 

(2.12c) 

where, C- > (+ G 3^ C- < C+ • Monotonicity in the activation function 
is considered important from the point of view of biological models 
of neurons. However, in the context of neural decision making, this 


'®In real world neurons, the activation function is sought to establish the dependence 
of the frequency of axonal spike generation on the membrane potential. Higher the 
frequency, greater is the level of activity in the neuron. 
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condition is, at times, relaxed and discrimination with non-monotonic 
activation function is typically based on functions of the form 

= (C+ - C-)e3:2;(- j) + C-, (2.13) 

which resemble Gaussian functions’® normalized to have unit variance. 
As monotonicity in activation functions helps preserve partial ordering.s 
in the input space, functions used for discrimination are, in general, 
piece-wise monotonic. Though this term is obviously redundant, its us- 
age explicitly specifies localized monotonicity, which essentially points 
to localized preservation of input space partial orderings. 

It is common to consider the transients due to input transitions as 
decaying rapidly and thus it is of interest to consider the steady state 
response of a neuron. The steady state neural model with additive 
dynamics is then of the typical form 

n 

7/(2;) = b7m]{x,t) ='y]w,.r{ - 0, as lim V{tJ) ~ i), (2.14a) 

r—+oo ' t 

7 = 1 

y ( 2 ) - 7 / {x, t) = a ( 7 ; {x, t)) = a { 1 ] (x) ) , (2. 14b) 

which resembles the formal model originally proposed by McCulloch 
& Pitts (1943), adapted, later on, by subsequent investigators. In the 
case of multiplicative dynamics with the abstract amplification function 

A variation of this scheme, known n.9 radial basis function networks are discu.ssed 
in the literature. These networks use quadratic, rather than linear, discriminants eval- 
uated as a norm -generally Euclidean -of the difference between the input vector £ and 
appropriately chosen vectors (in the same space as £), say i ^ 1,2,..,, N, forlome 
given N Discrimination is non-monotonic and is through functions with even symmetry, 
typically Gaussian, operating on the quadratic discriminants. 
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being a{i]) = -4x77 and the abstract translation function taking a form 
&( 7 /) = + Bq, and the variation, if any, in the input x being 

reasonably slower than the decay of transients, the steady state model 
of the neuron is 

i]{x) =lnn7]{x,t)= , (2.15a) 

t-*oo Bq SX 

y{x) = hm y{x,t) = Imi a{7]{x,t)) = a{7]{x)) . (2.15b) 

t--*oo t— *-oo 

The steady state versions of the formal models amply suggest that 
the neural state variable ;; (membrane, or post-synaptic potential) is 
influenced by x, the pattern incident on the input channels, through 
a projection along a vector of interconnection strengths. Topologically, 
this projection implies that with bivalent^® neurons (as expected with 
hardlimiter activation function) the space (set) of input patterns is par- 
titioned into two distinct regions, the dichotomy being decided by a lin- 
ear manifold, ie, a (hyper )-plane: the input pattern space is then said 
to be linearly separated (Cover, 1965; Minsky & Papert, 1969; Lipp- 
mann, 1987; Matheus & Hohensee, 1987). 

Boolean functions being bivalent, McCulloch & Pitts (1943) and 
similarly Cover (1965) and Hurst (1971) working in threshold logic, 
demonstrated that bivalent neurons are capable of realizing Boolean 
functions, thereby providing a framework for representing formulae of 
the propositional calculus, an idea that inspired the early design of logic 
gates and consequently digital computers. Rosenblatt (1961), in his 

Bivalent neurons are also known in the literature as ’binary’ neurons. 
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study ofporccptroiis,^' provided an al},a)rithin, with provtai and desir- 
able convergence characteristics, for an automated specification of tlie 
strengths of modifiable connections associated with the (single layer 
of) decision units (processing the outcomes of [static) predicates o[)er- 
ating on the input pattern) given the required (bivalent) mapping to be 
imposed on chosen patterns, this scheme, involving neural implemen- 
tations of predicates and decision units, i.ogc'tlier with llu' pi'rcr'pt ton 
learning algorithm, was suggested to be a model for perception, learn- 
able from examples, in biological systems. 

Minsky & Papert (1969), however, established that single (bi- 
valent) neurons by virtue of the linear separation induced on the space 
of input patterns, are, in general, incapable of representing all Boolean 
functions and showed that the parity function (also known as XOH) is one 
of many Boolean functions which demand a separation different from 
that provided by linear manifolds. This limitation had already triggered 
an inquiry into the representational capacity of networks of neurons 
under the heading of multi-layer pcrccptrons (cf, Rosenblatt, 1961), 
however, inadequate^^ automation in the specification of interconnec- 
tion strengths in multiple layers of neurons discouraged the deploy- 
ment of the neural processing alternative, till the usage of (stochastic) 
gradient descent in the learning of interconnection strengths. 

*’See (.ho description of perceptions, by Rumelhart and McClelland, quoted onrlior. 
22The reasoning provided by Min.sky & Papert (1969) has. in (ho litpraturo, boon 
attributed, po.saibly inaccurately, to nn onset of near dormancy (between 1970 and 1985) in 
neural network re.search. However, most of the architectural and piocedural, innovations 
in neural networks have been conducted in exactly this period! 
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In the case of neurons with bivalency, as provided by the hardlimiter 
activation function, additive and multiplicative d 3 niamics, specially in 
the steady state, are really equivalent, le, every dichotomy under one 
scheme is realizable under the other. This equivalence is, however, not 
noticed with continuous activation functions like sigmoid and radial 
basis. A careful reflection on the reciprocal dependency relationship 
in the steady-state version of the formal model of shunting dynam- 
ics, as compared to a direct dependence in additive dynamics, reveals 
that multiplicative dynamics allows the geometry provided by additive 
d 5 mamics to be inverted. Figure 2.3 compares the discrimination pro- 
vided, in steady state, by hardlimiter, sigmoid and radial basis functions 
under additive dynamics. 









(a) 


(b) 




C- 



(c) 


Figure 2.3: Comparison of discrimination, in steady state, under ad- 
ditive and multiplicative dynamics 
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Networks of Neurons 


Single neurons, especially of the linear separating type, are not power- 
ful enough to realize (or approximate) all the decision (or classification) 
functions of interest {cf, Minsky & Papert, 1969; Lippmann, 1987), 
and consequently, networks of neurons have been explored in the liter- 
ature. A neural network, is essentially an interconnected ensemble of 
local dynamical systems (also oscillators) and the resulting dynamics, 
if any, of such a system are due to the structure of inter-neuron inter- 
connection in addition to dynamics supported by individual neurons. 


One of the basic tenets of neuroscience is that complex behaviors 
such as sensory perception or motor control, exhibited by the brain, 
arise from the interconnection of neurons into networks or circuits The 
interconnection structure defines the network architecture and current 
classification of neural networks rests on this principle. Formally, a 
neural network is specified by the following. 


.(i) , , 
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Note that the above specification does not disagree with the def- 
initions suggested by DAEPA (Sage & Withers, 1990) and Hecht- 
Nielsen (1990) Layers, in neural network related investigations of 
physicists, have also been identified with fields^'^ (c/, Amari, 1983; 
Sompolinsky, 1987; 1988; Hopfield, 1982; Grossberg, 1982): this 
terminology allows neural networks to be discussed in field theoretic 
terms. The above specification is really not very general, as it precludes 
feedback between non-adjacent layers, though, keeping with the cur- 
rent tradition in neural networks, incorporation of such feedback would 
follow the same principles guiding intra-layer interactions. 

Layering, in the specification of neural networks, admits external 
stimuli only into a specific layer, termed input layer and protects (dis- 

^^When a quantity i^{x) is defined at every point a: in a certain region of a space 
physicists say that a field of the quantity is given (ltd, 1987). 
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allows) nodes in other layers from being directly influenced by external 
stimuli. Similarly, only a specific layer, termed output layer, is allow’ed 
to inform the external world of the processing accomplished by the 
network, other layers play the role of providing a mechanism for in- 
termediate (and internal) computations and are not allowed to directly 
influence the external environment. Layers not in direct interaction 
with the external environment are labelled as being hidden. 

In the neural network literature, a general disagreement is evident 
in assigning the role of input layers. Consequently, a confusion and 
inconsistency, pervades in the numbering of layers and the number of 
layers to be ascribed to a (layered) network of neurons, though a con- 
sensus, borne out of our natural sense of ordering, prevails in that the 
numbers assigned to layers increase as the degree of (inter-neural) asso- 
ciation, in a sense equating the layer number with depth of information 
processing incorporated on external inputs (stimuli). 

Early research, attempting to model (sensory) perception and mo- 
tor control through neural information processing mechanisms, {eg, 
Rosenblatt, 1958; Albus, 1975) has shown an inclination to devote 
the input layer to gather external stimuli and the role of this layer 
in inforntation processing has been to fan-out the collected stimuli to 
appropriate decision units (predicates) in the network of neurons. In 
contrast, research wherein neural networks arc viewed as a computa- 
tional substrate (cf, Lippmann, 1987), has had the input layer partici- 
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pate in decision making and the role of distributing external stimuli to 
appropriate channels (inputs) of decision units has been merged with 
the functionality of variable strength interconnections. The latter view 
is better suited to discussions of signal processing with neural networks 
and has been incorporated in the specification of networks of neurons. 

The following notations hold in the preceding equations, 
number of layers in the network. 

number of processing nodes (le, neurons) in layer £,£ = 1,2, . L. 
feed-through synaptic efficacies {le, inter-layer interconnection 

strengths) for processing node in layer £, = 1, 2, . mi, 

£=1,2, L. 

feed-back (recurrent) synaptic efficacies {le, intra-layer inter- 
connection strengths) for processing node in layer £, = 

1,2,. m^,£=l,2, L. 

propagation (and refractory) delay in the feed-through path 

from processing node if of layer (£ ~ 1) to processing node 

in layer £, 2 / = 1, 2, m^_i, /^^ = 1, 2, . ni/, £=1,2,. L, 

propagation (and refractory) delay in the feed-back path from 

processing node ir to processing node both in layer £, ir = 
1,2,. m/,/^> = l,2,. .nu, ^=1,2,...L. 

For convenience, delay terms related to external stimuli (ic, pattern 
x) have been ignored: this would not (appreciably) alter the usefulness 
of the discussion in the not-so-unrealistic case of transitions in (exter- 
nally applied) inputs being less frequent than the change in the state 
variables due to internal dynamics. If the transients in neural response 
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due to input transitions settle rapidly, and thereby can be ignored and 
in addition, all feed-back delays are of unit duration {and feed-through 
delays arc null), then the above expressions for an /.-layercnl ncniral 
network, with additive dynamics in the neurons, have the following 
simpler form,*^'^ in steady state. 


(x, I') 
2 / j ( V)(£,0 
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j(^) = l,2, .7»i, 

for some 7)ii = 1)2, - (2.17a) 
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= 1, 2, for some = 1,2,,.., 
f=l,2,.. L, (2.17b) 


where // is the discrete time travel index. 


From the above simplilication it is easy to see that each layer of a 
network of bivalent neurons, by virtue of a conjunctive logic naturally 
operating on the decision regions represented by individual nodes, par- 
titions the space of patterns seen by the processing nodes of that layer 
into numerous regions, each manifesting as an appropriate subset of 

^'^This form is popular in the computational explorations of neural networks and sug- 
gests the possibility of a unification of neural networks with other dynamical systems 
like Cellular Automata, (Universal) Turing Machines, Discrete Event Systems, e^c, under 
the haeding oi function fields over lattice index spaces: the functions in the field are, in 
general, between (appropriately chosen) manifolds. 
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the desired partition in the space of (externally) applied input pat- 
terns. Though no thorough characterization of the nature of decision 
regions induced by multi-layered neural networks is available, Lipp- 
mann, 1987 has argued out, geometrically, that separation of the in- 
put pattern space for networks with two or more layers is, in general, 
through a nonlinear manifold and the nature of nonlinearity has been 
described in terms of convexity,^'^ and closedness, of the decision regions 
induced in a space of patterns isomorphic to the Euclidean space of 
dimensionality n, the number of inputs to the nodes of the input layer. 

Single layer networks (of bi-valent neurons) are shown to partition 
the input space into two convex regions, at least one being non-null and 
all non-null regions being open (fe, unbounded). Two-layer networks 
partition the input space into two regions, at least one being non-null 
and no more than one of the non-null regions being non-convex and 
no more than one convex region being closed (bounded) with a single 
(connected) component. A network involving a cascade of three-layers of 
(bi-valent) neurons partitions the input space into two regions, at least 
one being non-null, all non-null regions allowed to be non-convex and no 
more than one non-convex region being closed (bounded), however, this 
closed region is allowed to have multiple (connected) components. It is 
commonly presumed, though not rigorously proved, that three layers 
are adequate to realize all binary partitions on Euclidean spaces. 

^®The nature of decision regions induced by (bivalent) neural networks, encourages the 
view that neural network approach might be suitable to handle convex progi amming and 
be of immense use in related optimization and search problems 
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The tradition of neuroscience being physicalist (and reductionist), it 
is common to find the network structures investigated to be of a homo- 
geneous (also regular) kind, le, all nodes of all layers are similar, if not 
identical Neural network taxonomy has been attemjjted only on such 
regular structures and the basis for the taxonomy is provided by the 
nature of neural dynamics, inclusive of the type of activation function 
and specification of inter-layer and intra-layer interconnections, which 
includes characterization of the time-translation, generally delays, in- 
volved in information propagation along these interconnections 

If all the intra-layer interconnection strengths (ie, t terms with ap- 
propriate indices) are null (zero), then the resulting network structure 
is termed a feed-forward network, else the network is said to be re- 
current Dynamics in the network response is due to isolated contri- 
bution of processor (internal) dynamics, as in feed-forward networks 
{eg, perceptrons), or intra-layer interactions, as in (auto)-associative 
networks,^** or a combined influence of both factors: the dynamics could 
be of additive, or multiplicative kinds, as discussed earlier. One as- 
sumption, implicit in investigations of neural network structures, is 
that some, or all, of the inter-layer interconnection strengths (feed- 
through and feed-back), are non-null, else the result would be a net- 
work structure with islands of (intra-layer) processing with apparently 
no means of information interchange between layers. 


'■^“Associative networks, of the nutomorphic and hotoromorphic kinds, have botm dis- 
cussed in the next article. 
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Neural Network Architectures 


Interconnected ensembles of neurons, or neuron-like d3rnamical sys- 
tems, while a plausible framework for the study of cognitive capacities 
exhibited by biological systems, in particular human beings and also for 
modeling complex system behavior, is too general a framework, even 
with layering, to be readily useful in neuro-science as well as neuro- 
engineering. One of the crucial issues in a study of neural networks is 
to be able to associate classes of neural network structures with appro- 
priate classes of information processing. Architecture,^^ a term incor- 
porated into studies of information processing (computational) systems 
to mean a description of components involved in a system whose or- 
ganization is related to the structure, function and performance of the 
system, the correspondences being conjointly discovered or evaluated, 
provides a reasonable medium for the study of information processing 
potential of specific neural network structures. 

Neural network architectures, in view of the formal specification 
introduced in the previous article, essentially describe the processing 

^'^Interpretations of the term architecture m computational systems is not without con- 
troversy The most common and foicoful, of interpietntions suggests that architecture is 
the perception of a system at a microscopic, rather than macroscopic (or molai), level of 
discourse: specifically in computer systems, architecture is considered to be the assembly 
{le, hardware) level description of systems as compared to application (te, software) level 
descriptions {cf, Dasgupta, 1984). Baer (1984), on the other hand, opines that archi- 
tectuie^ involves an account of organization of components, enunciating the structure, 
function and performance of the system as a whole This view, while not precluding the 
former, punctuates the interplay of synthesis and analysis, between the microscopic and 
macroscopic levels of discourse In the ensuing discussion, I prefer to use the second 
interpretation. 
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nodes in number and type and inter-processor interactions in terms of 
the nature and strength of interconnections (which also spr'cifi(>s lay- 
ering), and means for deriving the requisite interconnection strengths 
from the supplied repertoire of examples, ie, the training set, also known 
as knowledge base, of the required processing to be incorporated (real- 
ized) by the network. The dependence of interconnection strengths 
on training samples, essentially a representational issue, is specified 
through a learning procedure: the learning process is, in general, it- 
erative and it is of importance to guarantee the convergence of the 
particular procedure used. In the following, I will briefly prestint some 
of the prominent architectures from a signal processing penspectivc, fo- 
cussing on the functional aspect of information processing provided by 
a neural substrate. A common trend in presentations of neural network 
investigations is to refer an appropriately ordered collection ofintcrjiro- 
cessor {ie, mter-neuron) interconnection stnmgths as the architecture 
of the network. 

Architectural types of neural networks are classified on the basis 
of the nature of associations, number of layers, recurrent dynamics 
and adaptivity in interconnection strengths. Networks can be auto- 
associative, or hetero-associative depending on whether, or not, the 
space of patterns in the input is the same as that in the output. In both 
cases, the association sought could be bi-directional (invertible), though 
such a requirement is very rare and unless otherwise mentioned, bi- 
directionality of association will be assumed absent. The number of 
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layers can be either single, or multiple and recurrence in networks, if 
present, is either of the laterally interacting kind (with one, or more, 
layers exhibiting intra layer interactions), or due to specific inter-layer 
feedback connections 

I assume that the interconnection strengths of all nodes need to be 
learnt and hence training has not been invoked as a feature for clas- 
sification, though adaptivity of the interconnection strengths during 
usage of the networks has been used as an important criterion for clas- 
sification Networks wherein interconnection strengths are not adapted 
have two distinct phases, that of learning, wherein the interconnection 
strengths are decided, and of usage, wherein the learnt interconnection 
strengths are employed to provide suitable decision functions. Dif- 
ferences in processing units, if any, is not being explicitly indicated 
to simplify the understanding of functional characteristics. Note that 
while neurons have been formulated as djmamical systems, none of 
the existing (prominent) architectures, make explicit use of neural dy- 
namics and dynamics, if any, in network response is due to inter-node 
interactions alone. 

Perceptrons (cf^ Rosenblatt, 1958), the first (and non-trivial) nichi- 
tecture to be proposed, are hetero- associative, non-recurrent networks 
wherein the decision units (neurons) are organized in single or mul- 
tiple layers, the interconnection strengths being non-adaptive. With 
hardlimiter activation function in all nodes, the network is viewed 
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as representing Boolean functions (formulae of propositional calculus), 
and also as inducing decision regions in the input space. As discussed 
earlier, layering is needed to increase the assurance of represc'ntation, 
especially when the decision functions to be represented involve sepa- 
ration by nonlinear manifolds. Multi-layered feed-forward networks^” 
have been the subject of prolonged investigation. 

Historically the first architectural innovation subsequent to percep- 
trons, provided by Kohonen (1972), initiated lateral interaction into 
neural information processing. Known in the literature as Kohonen 
layer, the network structure consists of a single layer of (laterally) in- 
teracting nodes, with symmetric (ie, unprioritized) interprocessor inter- 
actions, the profile of interconnections being derived by an ’On Center, 
Off Surround’ rule, compelling near neighbour interactions to be sup- 
portive, if not excitatory and interactions of (not too) distant neighbours 
to be inhibitory, if not ignored. As each processing unit attempts to max- 
imize its own output and at the same time suppress activity in other 
nodes, this network structure is also characterized as being baaed on 
the principle that the winner takes all. 

It is interesting to note that this hetero-associative network struc- 
ture, in view of the competitive nature of information processing, pro- 

®®These networks are also termed backpropagation networks in the literature, to indi- 
cate that learning is accomplished by error back-propagation. However, I will refrain from 
using this term as it does not explicitly indicate the organizational specific of layering and 
non-recurrent interconnections. Further, backpropagation, as a learning mechanism, is 
not restricted to multi-layered feed-forward networks alone. 
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vides a generalization of the notion of multivibrators (Millman & 
Halkias, 1967), the basis of sequential circuits in digital electronics 
and has been incorporated into several subsequent architectures. Ko- 
honen layers have been applied in associative memories (Kohonen, 
1984), topology feature maps (Kohonen & Makisara, 1986), hamming 
net and several other clustering situations. One of the attractions of 
this network structure is because of the unsupervised (actually semi- 
supervised) learning procedure 

Bidirectional Associative Memories (Kosko, 1987; 1988), Cognitron 
(Fukushima, 1975), Neo-Cognitron (Fukushima & Miyake, 1982; 
Fukushima, 1987), Counter Propagation Networks (Hecht-Nielsen, 
1987a) and networks of Adaptive Resonance Theory (Carpenter & 
Grossberg, 1987a) are some prominent architectures that make use of 
competitive information processing (re, lateral interaction layers). The 
auto^associative network structure of Hopfield (1982) is similar to 
Kohonen's lateral interaction layer, however, the state encodings, inter- 
connection profile and type of activation function being different, the 
nature of dynamics is slightly altered. 

In both network structures, settling of the network n\si)onse to a 
(stable) attractor is very important. Auto-associative networks, in 
view of automorphic dynamics, have been used in associative memories, 
solving problems of optimization and search (c/, Baum, 1986; Jeffrey 
& Rosuer, 1986; Saylor & Stork, 1986), and have also motivated the 
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entry of physicists into neural information processing. The computa- 
tion of auto-associative networks is compared to Ising spin systems {cf, 
van Hemmen, 1986; Amit, 1989), thereby boosting studies in neural 
network dynamics. 

Bi-directional associative memories, an extension of auto-associative 
networks, realize hetero-associative invertible (bi-directional) maps. At 
the organizational level, two lateral interaction layers are invited to in- 
teract through inter-layer interactions. Counter propagation networks 
realize hetero-associative invertible (bi-directional) maps with a single 
lateral interaction layer and provision made for representing dependen- 
cies between inputs and outputs. Neo-cognitron is hetero-associative 
and consists of multiple layers of lateral interaction: the interconnec- 
tion strengths are not adapted during network use. This architecture 
is claimed to be capable of providing visual information processing, 
with (intra-pattern) translational and rotational invariances. It is of 
interest to note that no architecture has yet been proposed to realize 
auto-associative maps without recurrence.^** 


Learning and Generalization in Neural Networks 

Neural networks, ie, ensembles of interconnected processing (decision 
making) units, invariably exhibit a dependency, in the realization of 


^®Usui, Nakauchi & Nakano (1991), however, have suggested a network for realizing 

invertible maps. 
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desired processor functionality, on the interconnection strengths be- 
tween the constituent processing units and the weightages associated 
with input and output patterns (signals). Connectionist processing, 
grounded in this dependence, is commonly projected as an incorpora- 
tion of available (desired) knowledge in the interconnection strengths: 
this interpretation necessitates that the interconnection strengths be 
considered as being separate from the processing units. 

Equally legitimately connectionist processing can be considered as 
processor realization achieved by networks of parametrically selected 
processing options. The parameterization (of operation, ie, decision 
making) in the participating processing units is incorporated by inter- 
connection strengths associated with corresponding nodes. This alter- 
native treatment to connectionism, though not yet popular, serves to 
provide a framework for unifying neural networks with networks of 
automata, typically schema of Turing Machines and cellular automata: 
the latter two are representative of symbolic computation. 

A salient aspect of neural network activity, in all treatments of con- 
nectionism, is to be able to relate admissible values of interconnection 
strengths to the (overall) processing functionality realized. Depending 
on the context, the automated process of explicating the dependency 
of processing on interconnection strengths is termed learning in situa- 
tions wherein specific aspects of the processing functionality, typically 
examples (also prototypes) of the input-output association provided, 
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are available and the task is to recreate these v(‘ry asfx'cts in the net- 
work, or studied under the general heading of pattern formation in 
situations wherein certain symmetries regarding the interconn(‘ctions 
strengths are known (speculated), generally through constraints im- 
posed by physicalist reductionist renderings to model (theory) building 
and the desire is to explore the kinds and nature, of processors that can 
be realized with the specific choice of interconiu'ction strmigths 

Considerations of computability {cf, Hoperoft & Ullman, 1989), 
in conjunction with the urge to maintain compatibility with biological 
metaphors, constrain the problem of ’learning from examples’ to one 
of stating the necessary and sufficient choices to be made in relation 
to the interconnection strengths in a network given a finite number 
of instances of the desired processor functionality. This nipertoiro of 
instances, also prototype input-output associations, is termed training 
set, or knowledge base and in this constrained situation, learning seeks 
to establish relationships between the interconnection strengths of a 
chosen network and the available (given) training set. 

Neural networks, acclaimed model free estimators and applicable in 
almost all situations of function approximation (processor realization), 
to maintain the claims of universality, are expected to be equipped with 
learning procedures that work reliably regardless of whether, or not, the 

^“Exploration of the influence of intorprocessor interconnection strengths on pattern 
formation is commonplace in evolutionary processing like Hopfield networks, genetic 
algorithms and cellular automata and provides an attractive environment for recasting 
search problems. 
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training set is deterministic, or drawn at random (with an appropriate, 
possibly unknown, distribution) and can to some reasonable degree 
state a measure of confidence of learning given the statistical nature of 
the processing instances in the training set. 

Finiteness of the number of instances in the training set restricts 
specification of processing characteristics to a narrow, possibly (in it- 
self) uninteresting, region of the domain on which the neural network 
is required to be defined and, consequently, mechanisms of extending 
the desired peculiarities of processing to regions of the input space not 
covered by the training set are needed: the process of establishing such 
extension of function evaluation is teTm.B6. generalization. While func- 
tion evaluation at input space positions other than those incorporated 
in the training set is guaranteed, axiomatically, by activation functions 
chosen to be non-constant with no more than a finite number of discon- 
tinuities, as eg, in sigmoidal (including hard-limiter) and radial basis 
functions, extension of function evaluation to realize the processing 
characteristics specified in the training set imposes constraints on the 
choice of interconnection strengths in the network. 

Extension, in the representation, of the desired procea.sing objective 
to the (entire) input space is, generally, assured by identifying a collec- 
tion of instances of association between inputs and outputs which are 
similar, yet distinct, to those in the training set, however, used for the 
explicit purpose of (cross) validation of the representation suggested for 
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the training set: this collection is termed test set. The tusk of seeking 
a representation of the specified knowledge (training set) is continued 
until evaluations of the processor, whose synthesis is guided by repre- 
sentations, in terms of interprocessor interconnection strengths, sought 
through learning, over input positions described by instances in the test 
set are satisificable, le, match (the specification in tlie test set) to within 
prespecified limits of tolerance. 


2.3 Neural Signal Processing: A thematic 
reconstruction 

Information processing, in particular the perceptual categorization of 
(visual) patterns, has been a constant focus of neural networks since 
inception through perceptrons of Rosenblatt (1958). However, signal 
processing with neural networks, as a specific research activity, orig- 
inates in the work on ADALINES and MADALINES by Widrow (1959), 
Widrow & Winter (1988). Despite the early origin, the pace of re- 
search in neural signal processing has accelerated only in the previous 
decade and of the many influences encouraging signal processing with 
neural networks, the tutorial paper of Lippmann ( 1 987) is notewortliy. 

The essential problem of neural signal processing is to realize the 
desired processor as a networked ensemble of basic decision stages. 
Processor representation, related to function approximation, sought 
through a repertoire of input-output correspondences rather than sym- 
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bolic/functional forms of dependency relationships, is associated with 
biological metaphors, notably leaniing by examples. 

Neural network research, originating in the work of McCulloch 
& Pitts (1943) and widely acclaimed following perceptrons of Rosen- 
blatt (1958), has been attributed, in the literature, to have entered a 
near dormancy following the sharp criticism offered by Minsky & Pa- 
pert (1969), However, neural networks has resurfaced as a major re- 
search activity following new directions provided by Kohonen (1984), 
Hopfield & Tank (1985), Denker (1986), Grossberg (1982), Mc- 
Clelland, Rumelhart, et al (1986a, 1986b).^^ 

Though neural networks research has had automated information 
processing as a consistent central theme, the underl 3 dng research in- 
terests have not been identical all through. Prior to the reemergence of 
neural networks, significant emphasis has been laid on realizing (prob- 
abilistic) decision functions with multi-layered neural networks: it is in 
this context that the powerful remark of Minsky & Papert (1969) crit- 
icizing the absence of a learning Theorem for multi-layered perceptrons 
is to be appreciated. 

Error back-propagation , a mechanism — attributed to Rumelhart, 
Hinton & Williams (1986) -which provides a reasonably general so- 
lution to the problem of learning of weights in multi-layered neural 

'^hSignal processing with neural networks, though initiated by Caianiello (1961), 
Fukushima (1969), Hopfield <£: Tank (1985), Kohonen (1980, 1981) and others, has 
found wide acceptance only since the tutorial paper by Lippmann (1987). 
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networks and the neural information processing schemes suggested by 
Hopfield and Kohonen, have triggered two distinct trends in neural sig- 
nal processing. One trend, due to physicists, yet significant in signal 
processing, has focused on the information processing potential as a 
function of the structural encoding, ie, the physics of neural networks. 
In this study, neural networks are necessarily of the recurrent kind and 
the patterns of evolution of the (neural activation) state vector conse- 
quent on structural impositions on the inter-processor interactions are 
in focus. 

Neural networks, viewed in this perspective, have been associated 
with other similar schemes of interconnected ensembles of (local) os- 
cillators, notably Ising spin models of statistical thermodynamics icf, 
van Hemmen, 1986) and Boltzmann machines (Hinton, Scjnowski 
& Ackley, 1984). The dynamics in such networks have been utilized 
in formulating search problems, in particular those that involve con- 
straints, eg, optimal solutions for the ’Travelling Salesman Problem.’ 
Neuro-biologists and neuro-anatomists have benefitted a great deal 
from these models in trying to identify specific structures of inter- 
processor interaction (cf, Peretto, 1992). 

The other research trend is to focus on the represi'ntation potential 
of feed-forward neural networks with a single (hidden) layer of process- 
ing, the outputs of the processors of this layer are linearly combined 
to derive the requisite output. In this form, the problem is close at 
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heart to approximation theorists and the major thrust in the study is 
to overcome the limitations of the (elementary) perceptron like single 
layer of processing nodes through the use of different kinds of nonlinear 
association mechanisms. 

A popular choice has been to use radial basis functions (Girosi 
& Poggio, 1991), essentially nonlinear functions with localized influ- 
ence, for implementing the activation function. One of the guiding 
principles in deciding the suitability of activation functions is to ensure 
good approximation characteristics and a simplification of the problem 
of learning of the weights associated with the constituent processors of 
the neural network. 

Restriction of decision making to a single layer (of sufficiently many 
processors) is not without reason. A justification for this choice is pro- 
vided by a Theorem of Boolean function representation due to Shannon 
(Kohavi, 1978): any Boolean function is representable in an AND- 
OR-INVERT processing scheme. Lippmann (1987) and subsequently 
others, have identified these three distinct logical operations in neu- 
ral networks. The input layer (of terminations followed by weighted 
channels) provides inversions, the single layer of nonlinear processing 
incorporates logical conjunctions/disjunctions, as the case may be and 
the final summation level provides the remaining logical function. 

Another, more technical, reason underlying the choice of a single 
layer of nonlinear processing is based on the computational complex- 
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ity of the problem of loading the training data {ie, learning problem). 
Stephen Judd (1990) has pointed out that as the number of nonlin- 
ear processing (decision making) layers increases, the complexity of the 
learning problem increases, whereby it is optimal, in terms of computa- 
tional resources, to specify a shallow architecture, ie, those with fewer 
number of layers, but sufficiently many processing nodes in the layers. 

In this section, I will present a cursory review of the research rele- 
vant to neural signal processing This review, inclined towards neuro- 
engineering as compared to the previous section, will he initiated with 
a historical perspective of neural signal processing. In view of the fact 
that research in neural networks is being contributed by investigators 
from several fields, principally physical sciences, mathematical sciences 
and the engineering community, several conflicting notations exist, dis- 
allowing their concomitant usage. Therefore, I have opted to present 
this review, as in the case of the previous section, with my own notation, 
a significant proportion of which has already been introduced earlier. 


History of Neural Signal Processing 

Neural signal processors, in both of the earlier mentioned trends of re- 
search, have been considered, in general, as nonlinear processors with 
interpretations of shift-invariance and incorporating causality when 
applied to filtering situations. In addition, neural signal processors 
have occasionally been discussed in processing contexts involving adap- 
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tivity and/or stochasticity. Processors realized with neural networks are 
classified on the basis of the nature of association (as hetero-associative, 
or auto-associatiue), degree of layering (as single layered, or multi lay- 
ered, based on the number of ’hidden’ layers of decision making) and 
the incorporation of recurrence and/or competition in processing.^^ 

Adaptive linear elements (ADALINES), essentially linear filters sub- 
jected to threshold comparison (sign test), are operationally described 
by (see Widrow & Lehr, 1990) 


= w, 21, (2.18a) 
2/(2:,.*) = crh{v{xr,^)) , (2 18b) 


where i = 0, 1, cr/, is the hard-limiter (also 1-bit quantizer) function. 
Despite adaptivity of filter weights, the above scheme corresponds to 
the formal model of neurons wherein shunting and dynamics are sup- 
pressed and linear separability is imposed on the corresponding obser- 
vation (ze, input) space. 

This scheme, representative of the earliest attempts in neural signal 
processing, focuses on hetero-associative maps (with no recurrence and 
competition) and the problem of learning weight values given a train- 
ing set is looked upon as the operationally equivalent task of adapting 


®^Though dynamical neurons are not unknown, no major processing scheme employing 
such processors has yet been studied and dynamics in processor lesponse has always been 
incorporated through recurrence or competition. It is noteworthy that though recurrence 
and competition share a great deal of commonalty in abstraction, the differences in 
interpretative content necessitates these two to be viewed as distinct concepts 
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(filter) weights «; to a (random) sequence of patterns drawn from the 
training set. In signal processing applications the adaptation of filter 
weights, in accordance with the (time) history of function approxima- 
tion error, is commonly by a least squares approach, typically the ’Least 
Mean Squares’ (LMS) algorithm, often called ’Delta Rule’ {cf, Widrow 
& Hoff, 1960), though this algorithm is not the most popular in adap- 
tive signal processing as convergence is not guaranteed in any sense 
stronger than that of expected error. 

Since single ADALINBS, ie, linear classifiers, have a limited repre- 
sentation potential (in terms of separation), as pointed out by Cover 
(1965) and Minsky & Papert (1969) and subsequently others, non- 
linear approaches have been considered essential for better classifier 
representation. Of these only two major traditions of realizing hctero- 
associative (non-recurrent, non-competitive) maps will be considered. 

One approach, originally due to Specht (1967b) and Ivankhnenko 
(1971), relies on subjecting suitably preprocessed versions of the input 
signals (patterns) to linear classification; the preprocessing is chosen to 
impose polynomial transformations on the input pattern vector, thereby 
presenting second and/or higher order correlations to the linear clas- 
sifier. Linear classification on polynomially transformed input vectors 

Adaptation algorithms for linear (and nonlinear) filters have been discussed exten- 
sively in the literature on signal processing. Commonly the ’Recursive Least Squares’ 
(RLS) algorithm (Haykin, 1984) is used in parallelized form (as PIIW algorithm due to its 
superior convergence characteristics. Chaturvedi (1994) has shown that PELS schemes 
provide a unifying thread for RLS and LMS approaches, and, in this unified framework, 
has compared the relative performances of the two schemes. 
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introduces curvature in the separation surface, le, allows the decision 
regions to be non-convex, though simply connected. 

Known in the literature as polynomial neurons and higher-order 
neurons (Spirkovska & Reid, 1992), the operational form of linear 
classification on pol 3 momially preprocessed inputs is given by 


N 


vU) 

1 = 0 7 

(2.19a) 

y{x) 

4,~ W J 

= cr(7/(a;)). 

(2.19b) 


where, (x) refers to the jth enumeration of homogeneous polynomi- 
als of degree ^ in the elements of x (ie, rational varieties of order z), N 
refers to the largest degree, possibly infinite, of polynomials relevant to 
the specific approximation task at hand and a is the familiar sigmoidal, 
or hard-limiting nonlinearity. The form for assigning 77 corresponds, 
closely, with Volterra filters used in nonlinear signal processing 

Discrimination need not be provided by nonlinear activation func- 
tions alone. In fact, nonlinearities in the weighting mechanism too can 
offer interesting discrimination even if the activation function is linear. 
The following is generalized (steady state) model of isolated neurons. 


P p n 


ii{x) = 

p=rlj=-l i—l 

(2.20a) 

y{x) = ct(/;(ji)). 

(2.20b) 


where, a polynomial discrimination of order P is being assumed, n is 
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the number of channels (inputs) and w with the relevant indices is the 
functional interconnection strength. 

Linear discrimination is obtained when w, in all the channels, is op- 
erationally equivalent to multiplication by a (channel specific) constant, 
and P = 1. Though the above expression depicts an instar neuron, a 
similar generalization to instar-outstar neurons is not difficult to visu- 
alize. Functional link nets of Pao (1989) are based on instar-outstar 
neurons with a similar generalization, wherein discrimination is of or- 
der 1 and the functional interconnection strengths, generally drawn 
from the space of trigonometric, or exponential, functions, incorporate 
spectral synthesis as an essential ingredient of neural information pro- 
cessing. 

Davidson & Hummer (1993) have pointed out that processors 
with the functional form 

T1 

vU) = v{-l) =\/ w, a X,, (2.21) 

t=i 

where, A and V, respectively, denote the Minkowskian operations of 
minimization and maximization (c/, Serra, 1982), when interconnected 
in a manner similar to conventional neural networks, are capable of rep- 
resenting morphological operations. Morphology neural networks, as 
they are termed have been suggested for image processing applications. 

The second approach in nonlinear classifier representation is to con- 
sider a layering of decision making stages: each layer is composed of 
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adequate number of linear classification elements (neurons). Known 
as feed-forward networks and MADALINES (for Many ADALINES), such 
layered ensemble of decision elements^'* induce non-convex partitions 
and Lippmann (1987) points out that larger the number of layers, 
greater is the possibility of representing fragmented dichotomies, le di- 
chotomies with disconnected components. It is not difficult to visualize 
that layered networks, with sufficiently many layers, are capable of 
representing all functions realized through polynomial neurons. 

Widrow, Winter & Baxter (1988) establish that MADALINES with 
a majority logic at the final stage captures rotational and translational 
invariances in patterns, however, it is essential that the multitude 
of hypotheses related to the several translated and rotated versions 
of the pattern to be recognized be detected in distinct networks In 
principle, this scheme is no different from that used in array-detectors 
{cf, Proakis (1989)), which resemble ’Linear Discriminant Functions’ 
suggested by Nilsson (1965). 

Networks of polynomial neurons have also been shown to incorpo- 
rate translational and rotational invariances {cf, Spirkovska & Reid, 
1992), though, in this ca.so, explicit detection of distinct translated and 
rotated versions of the patterns, and the subsequent majority logic are 

^^Aleksander (1983a) and Stonham (1983) discuss pattern discrimination and recog- 
nition, in the context of patterns described over Boolean (bivalent) spaces, through net- 
■works of memory elements: in these networks, the role of neurons (AOAi.lNES) are realized 
with (programmable) memories storing the essential nature of the input-output associa- 
tion. The equivalent of learning is accomplished by identifying the appropriate contents 
at the various ’addressable’ locations of the memory elements 
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not needed. Invariances, in pattern recognition, are attributed to spe- 
cific higher order correlations of the input signal (pattern) 

Hetero-associative neural signal processors have not been restricted 
to classifier representation and realization through non-recurrent, non- 
competitive means. Feed forward neural processors have been dis- 
cussed in the general context of function approximation, of which clas- 
sifier representation is a specific case. In this context, the nonlinear 
activation function a is generally not discrete valued, but takes on a 
continuum of values: often a sigmoidal function. Studies on the approx- 
imation potential of neural networks and related convergence issues, 
have brought to light the importance of the nature of nonlinearity in 
the activation function.^® 

These studies (see Girosi & Poggio (1991)) have revealed that 
nonlinearities with a global influence, sigmoidal function being a typical 
example, are unsatisfactory for function approximation, with a single 
layer of decision making, as the number of decision making units (ie, 
neurons) is undesirably large.^” As a consequence, the convergence 
characteristics and the assurance of approximation are not adequate. 

Present research on neural function approximation focus on net- 
works with a single layer of nonlinear processing and exhibit an em- 
phasis on nonlinear functions with local influence, typically radial ba- 

However, such studies are largely limited to single (hidden) layer networks. 

®®This notion has been captured more precisely by Cybenko (1989) in terms of the 
denseness, of the space of functions realized, in the space of continuous functions. 


Section 2 3 Neural Signal Processing. A thematic reconstruction 


93 


sis functions {op to facilitate simpler (compact) representation of 
functions inducing partitions with non-convex pre-images Gabor func- 
tions and Wavelets (Chui, 1992; Daubechies, 1992), local functions 
lately popular in signal representation and processing, have also been 
suggested (see Daugmann, 1988 for use of Gabor functions and Zhang 
& Benveniste, 1992; Pati & Krishnaprasad, 1993 for the adoption 
of* Wavelets) as suitable candidates for use as activation functions a of 
neurons. 

Radial basis function networks {cf, Poggio & Girosi, 1990; Haykin, 
1994) differ from the neural network architectures discussed earlier 
in the sense that discrimination is effected non-monotonically on dis- 
criminants having quadratic, rather than linear, variation with input 
patterns. Inspired from a consideration of approximation using regu- 
larization theory, these networks approximate the desired function as a 
member of the linear span of Greens’s functions of an appropriately cho- 
sen self-adjoint differential operator: the basis functions turn out to be 
Gaussians when the chosen differential operator is translationally and 
rotationally invariant. Relative ease in parameter specification and 
compact network structures have made radial basis function networks 
popular in signal processing contexts 

In view of the fact that the discriminants in radial basis function 
netwoi'ks are quadratic functions of the inputs, typically formulated as 

Ridge functions of Ya Lin & Pinkus (1993) are functionally similar to radial basis 
functions, with similar approximation characteristics. 
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the (Euclidean) norm of translated versions of the input pattern, the 
decision regions of isolated processing nodes are (hyper) spheres in the 
(Euclidean) space of inputs, the center being specified by the translation 
vector involved in evaluating the discriminant. Such decision units 
have been termed as being diameter limited by Minsky & Papert 
(1969). 

Networks of such processing nodes have been shown {op cit) to be 
incapable of representing predicates of connectedness, relevant in per- 
ceptual processing stemming from computational geometry. Wavelet 
networks suggested by Zhang & Benveniste (1991, 1992) and Pati 
& Krishnaprasad (1993) too exhibit a diameter limitedness and suf- 
fer from the same limitations These limitations while inconsequential 
in the context of function approximation, are important when the ap- 
proximation is given a cognitive/perceptual connotation . 

Dynamics in the response of neural signal processors, generally 
incorporated through competition or recurrence, have been initiated 
by Kohonen (1984) and Hopfield <fe Tank (1985), respectively. Of 
these, competitive networks have been considered mainly in situations 
demanding interpretation in terms of feature extraction and (unsuper- 
vised) clustering, while recurrent networks are employed in realizing 
automorphic transformations, preferably lacking ergodicity, common 
in modeling of physical phenomena and in solution of search problems 
with a requirement of optimization. 
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These networks are functionally similar to Equation 2T6a, however, 
differ in the manner of state encoding and types of nonlinearities.^® 
Such networks have been studied as associative memories wherein 
pattern recall given (partial) cues has been likened to human mem- 
ory. Competitive networks provide hetero-associative maps, while re- 
current networks, of the Hopfield kind, realize auto-associative maps. 
Each node in a competitive network is identified with a distinct concept 
(cluster) and it is not uncommon to find this association interpreted in 
the same sense as grand-mother neurons {cf, Hofstadter, 1979) 


^®Kohonen networks assume unipolar bivalent neurons, le, output restricted to the 
limits (also steady state values) ’0’ and with saturating linear functions for cr, while 
Hopfield’s circuit assumes bipolar neurons (outputs vary between and unipolar 
weights and hardlimiting (or sigmoidal) activation functions The inputs of Kohonen 
networks are organized on the unit (hyper sphere), whereby partitions are measured in 
terms of the (solid) angles at the centroid of the (hyper) sphere and the performance of the 
processor, expressed in terms of the settling (le, convergence) characteristics, is strongly 
influenced by the (angle of) separation between mutually distinct clusters In contrast, 
the inputs of Hopfield's circuit are organized on the extended (hyper) cube [-1, 1]” and as 
established by Amit (1989), settles to an appropriate attractor (fixed point), depending 
on the initial location of the state vector, only if the transformation incorporated is non- 
ergodic A weight matrix expressed as the superposition of the self outerproclucts of the 
desired attractors, subject to an annulment of the main diagonal, has been shown to be 
sufficient to represent (nearly) orthogonal attractors and empiiical investigations have 
revealed that the number of attractors that can be stored with a reasonable degree of 
recall is of the order of 15% of the number of participating nodes Both netwoiks consider 
information propagation delays between lateral nodes to be of unit magnitude and im- 
pose symmetry in the interaction between past outputs and current evaluation through 
symmetry in the matrix of weights € and this symmetry together with an identicality of 
the nonlinear activation function at the various nodes reflects the homogeneous nature 
of processing In these networks, termed associative and explored with connotations of 
memory (le, storage and recall), convergence of the dynamics is with respect to an ob- 
jective function which has been shown, in the case of Hopfield’s circuit, to be related to 
the Hamiltonian (Lyapunov function) familiar in studies of Ising spin systems and this 
aspect is exploited when these networks, or their variants, are employed in (optimal) 
search problems, eg^ solution of 'Traveling Salesman Problem,’ or routing m VLSI circuits. 
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Narendra <6 Parthasarathy (1990) have discussed dynamical pro- 
cessors obtained by looping back the output of a feed-forward network 
(of conventional neurons)' this form of computation, a generalization 
of recurrence considered in Hopfield’s circuit, necessitates the number 
of outputs to be the same as that of inputs. Recurrent computation 
with probabilistic state transitions have been the focus of Boltzmann 
machines, suggested by Hinton, Sejnowski & Ackley (1984) (also see 
Hecht-Nielsen, 1990; Haykin, 1994), which is essentially a Hopfield 
circuit wherein transition of states (ie, response of activation functions 
a which are allowed to take discrete values -1 or 1) in the processing 
nodes is governed by the Boltzmann distribution 

P{a{rij) = -cr(77j)|r;j) = ^ 

A further generalization of recurrent computation has been incorpo- 
rated in the Bidirectional associative memories (BAM) of Kosko (1987), 

this expression j and i indicate indices on the single layer of nu processing nodes, 
H, the Hamiltonian, or energy function of the Boltzmann Machine is given by 

p = - j 

j=i 1=1 

AHj = —2cr(r)j)r}j is the change in the energy function of node j while flipping the 
state and T has the connotation of temperature, whose (controlled) reduction freezes the 
transitions and cr, is the unipolar sigmoidal function commonly used as the activation 
function of neural processing elements. This network, trained by a procedure of sim- 
ulated annealing (Kirkpatrick, Gelatt & Vecchi, 1983), though slow in convergence, 
overcomes, in a statistical sense, the annoying aspect of search settling in local minima 
rather than the global minimum -common in learning through error backpropagation 
and in computations of Hopfield circuit. As the convergence of this network is in dis- 
tribution, these machines have been regarded in the literature to be useful in learning 
probabilistic maps. 
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essentially a 2-layer network described, recursively, by 



^ = 0, 1, . . . , y^^'> (i) = (z) = 0, Vz < 0, 


where [wi,W 2 , = W=W = (noting that 

the weight matrix W (= e specifying (hetero) asso- 

ciative maps between distinct (or dissimilar) processing fields/® and 
focusing on identification of the possibility of finding convergent pat- 
terns in one field, given (partial) cues in the other. 

Bidirectional association, in particular the realization of continuous 
maps, whose inverse map exists and is continuous, have been the focus 
of counter propagation networks (Hecht-Nielsen, 1987b; 1990), which 
incorporates a single decision making layer of the competitive kind and 
organizes the (bidirectional) association between inputs and outputs 
through stages of instar and outstar processing {cf, Grossberg, 1982). 
the most remarkable aspect of this network is that only one level of 
conceptual entities encode, simultaneously, a function and its inverse 
map. A similar attempt at conjoint representation of functions and their 
inverse maps, though with feed forward networks, has been described 
by Usui, Nakauchi & Nakano (1991). Both Hecht-Nielsen and Usui 
et al rely on the sufficiency of a single level of decision making (termed 

‘‘“Field theoretic investigations into neural information processing have been discussed 
byAmari (1983) 
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three layer networks) for representing the desired functions and their 
inverse maps. 

Having seen processors ofthehetero-associative and auto-associative 
kinds through feed-forward, competitive and recurrent schemes, it 
is natural to get interested, for the sake of completeness, in proces- 
sors which incorporate interaction between layers, each with lateral 
dynamical interaction. Such processors have been investigated by 
Fukushima (1975, 1987) through cognitron and neo-cognitron and 
Carpenter & Grossberg (1986b) in their Adaptive Resonance Theory 
(ART); both efforts have been in visual pattern recognition and induce 
hetero-associative maps. 

Cognitron and Neo-cognitron, essentially a hierarchy (ic, feed-through) 
of lateral interaction layers (generally six in number), are claimed to 
incorporate translational and rotational invariances. Networks of ART 
consist two key competitive layers, each influencing the other, unidi- 
rectionally through an appropriate feed-forward structure. By design, 
these networks incorporate storage, search, comparison and recall of 
patterns and are geared to handle adaptive pattern recognition situ- 
ations by storing (or replacing appropriate stored patterns with) pre- 
sented patterns if the mismatch (distance) with any of the patterns 
already in storage exceeds a threshold: this threshold is ascribed an 
interpretation of the level of vigilance. 
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Representational Issues in Neural Signal Processing Ar- 
chitectures 


The architecture of neural networks used for signal processing resem- 
bles, largely, a multi-layered neural network, except for the activation 
functions of the final layer, which are all identity maps and, hence, 
trivial from the point of view of categorization: such a formulation is 
equivalent to stating that the outputs of a neural signal processor are 
linear combinations of the outputs of a multi-layered neural network 
(with non-trivial activation functions)."*^ Networks with a single layer 
of processing realizing functions of the form^^’^^ 

Till 

/U) = = 1,2, . ., (2.23) 

i=i 


have been of principal focus in studies available in the literature (cf, 
Hecht-Nielsen, 1987a; Cybenko, 1989; Mhaskar, 1993; Ya Lin & 
Pinkus, 1993). 


It is common for the architecture of a neural network to be identified by a description 
involving an ordered list of the kind mo-mi— -m/;,, where L refers to the number of 
layers of decision making, rno the number of elements m patterns incident on the input 
layer of the network and 77? 7 = 1,2, ,L indicate the number of processing nodes in 

the zth layer of decision making. 

^^While a scalar function has been indicated, this form can be effortlessly extended to 
cases wherein vector outputs are needed. 

'^^Tn the original formulation, the outputs of dynamical processors like Ilopfield’s circuit, 
BAM and Boltzmann machine, in contrast with counter propagation networks, aie not 
expressed as linear combinations of responses of decision elements. A reformulation of 
these processing structures into the framework suggested by the associated equation is 
not only obvious, but also offers an opportunity for a (minor) generalization of the original 
formulations 
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Networks of the functional form in Equation 2.23nopage, uneasily 
idontihcd with three layer neural networks, ate claimed, in the liU'ni- 
ture, as being sufficient to represent all functions of interest: this claim 
is supported by the rigorous exposition of Cybenko (1989), followed by 
Vepsalainen (1991), Mhaskar (1993) and others Functions realized 
(as in the above networks) through finite, thougli unrestrictc'd, linear 
combinations of the outputs of a single (hidden) layer of processing, 
have been shown through Cybenko’s efforts as being dense in the space 
of continuous functions. 

The activation functions a, in Cybenko’s claim of denseness of rep- 
resentation, are chosen to be sigmoidal, thereby asserting the existence 
of representation, with arbitrary accuracy, for all continuous functions; 
however, the network structure might involve an unappealingly large 
number of nodes in the (single) processing layer. The universality of 
neural networks, though with a single layer of processing, in approxima- 
tion has been established by Hornik, Stinchcombe & White (1989), 
wherein continuity, together with non-constancy, of the activation func- 
tions has been shown to assure denseness of representation in the space 
of continuous functions. 

Hecht-Nielsen (1987c) and later Shrier, Barron & Gilstrap 
(1987),Girosi<S:Poggio (1991), Cotter <fe Guillerm (1992),Kurkov6 
(1992) and others have studied the representational potential of net- 
works involving two layers of processing, again identified with three 
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layer neural networks, realizing functions of the form 

77X2 nil 

yis.) = ^2 2 ~ ~ ^ 2 . 24 ) 

j=i i=i 

Of particular interest in these studies is a function representation The- 
orem due to Sprecher (1965), an improvisation on that established by 
Kolmogorov (1957b) in connection with a solution (in the sense of a 
denial of the hypothesis) of the 13th problem of Hilbert 

The Theorems of Kolmogorov and Sprecher suggest an approxima- 
tion scheme similar, in form, to the above equation and, on the strength 
of this similarity of form, a representation of all continuous functions 
described on a bounded linear subspace (of n(= 77?o) dimensions) of the 
Cartesian product space a typical choice being f the Euclidean 

space [0, 1]"^, is claimed to be admitted by a network comprising exactly 
7i(n + 1) nodes in the first layer of processing and 2n -h 1 nodes in the 
second (and final) decision making layer 

issues in Concept Representation 

Representation of perceptually relevant operations is the principal fo- 
cus of neural networks and it is common to find functions represented by 
nodes, in the ensemble being associated with concepts. In this spirit, a 
layered (feed-forward, non-evolutionary) neural network is interpreted 
as realizing a hierarchy of concepts. Typically, concepts are expected 
to highlight specific relative organization of assignments in the inci- 



102 


Chapter 2. Signal Processing with Neural Networks 


dent input patterns and concepts are often interpreted as predicates of 
logic, generally of zeroth'*'* order (le, propositional calculus), operating 
on elements within the input pattern. 

Neurons with hard-limiting activation functions, studied originally 
by McCulloch & Pitts (1943), and their networks have been shown to 
represent Boolean functions, essentially formulae of the propositional 
calculus. Sigmoidal activation functions have been shown, in the lit- 
erature, to enable a representation of formulae in the calculus of fuzzy 
propositions noting that the graded response provided by such activa- 
tion functions is an excellent candidate for being a (set) membership 
function. 

At a conceptual level, discrimination due to piece-wise monotonic 
activation functions, which are described, essentially, on localized (pos- 
sibly compact) support, is similar to that provided by sigmoidal activa- 
tion functions, in that, the response of (isolated) neurons is a statement 
of the occurrence of specific relative organization of assignments in 
the relevant input patterns. Hence neurons with such activation func- 
tions are considered to represent predicates of an appropriate logical 
system.'*® 

‘•‘'Predicates of zeroth order are more technically known as propositions. I have pre- 
ferred to use the more general term predicates to maintain reasonable compatibility 
with the usage of the term predicates by Minsky & Papert (1969). At an abstract 
level, predicates really are relations defined to capture specific logical correspondences 
between relevant entities: in the present discussion, logical association is sought between 
elements of (input) patterns. 

■‘^Predicates represented by piece-wise monotonic activation functions generally rest 
on supports that have several, mutually non-overlapping, connected regions within the 
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Specificity of relative organization is decided completely by the pa- 
rainoters (mainly weights, thresholds -as incorporated by the abstract 
translation function 6 -and, relatively infrequently, the abstract ampli- 
fication function a) and thus, an interpretation of parameters as tem- 
plates of the perceptual entities being sought to be represented would 
not be inappropriate. Neural signal processing then is akin to tem- 
plate matching, however, the crux of the problem lies in deciding, by 
automatic means, relevant templates given (valid) examples of associ- 
ation between instances of input patterns and (perceptually grounded) 
responses, actions, or decisions. 

Concepts, identified essen- 
tially as categories, are dis- 
tinguished by van Loocke 
(1994) as being taxonomic or 
complexive This distinction 
is made on the basis of the 
existence of a common core 
of attributes, or features, in 
the instances, or examples, 
meant to suggestthe concept. 

Implicit is the assumption that 
no formal description of the 
concepts, or equivalently cat- 




Taxonomical Category 


- 

1 


Complexive Category 


Figure 2.4- Types of concepts 


input pattern and distinct (connected) components of the support correspond to distinct 
monotonic segments of the activation function 
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egories, is available and the only description that can be given is in 
terms of (illustrative) examples. In this framework, a category having 
instances containing a (non-trivial) common core, or essence, is consid- 
ered as taxonomical and the concept, or category label, is associated 
with this common core, whereas, complexive concepts refer to the ab- 
sence of commonalty in the instances and the labels of such concepts 
have to necessarily be linked to all instances 

A majority of neural network architectures and situations of engi- 
neering interest, are concerned with the representation of taxonomical 
concepts, and the sole effort of generalization, during training, is to de- 
rive the (relevant) common core given the examples of input-output as- 
sociation, le knowledge about processor functionality. Complexive con- 
cepts arising in information processing contexts like natural language 
understanding, or pre-attentive vision, essentially situations wherein 
the (representative) instances have semantic import, rather than syn- 
tactic relevance (as in taxonomical categories) and their representation 
has been focused in the architectures of ART. 


2.4 Summary 

Representation of concepts, mainly of the taxonomical kind, is the fo- 
cus of information processing and the continued increase in the in- 
volvement of automated information processing and, ultimately, the 
automation of intelligence, in nearly every aspect of human existence. 
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has necessitated information processing, typically decision making and 
estimation, to be supported in situations wherein the conditions ensur- 
ing satisfactory performance of linear procedures cannot be guaranteed 
and exhaustive formal (symbolic) statements of processing functional- 
ity are almost impossible to be enunciated. Nonlinear processing ap- 
proaches have been sought to overcome the serious limitations of linear 
approaches and of these neural network based schemes, originally in- 
vestigated as models of (human) abilities, are prominent. 

Connectionist information processing systems, essentially neural 
network schemes of processor realization designed to represent the 
inherent structure specified through examples of input-output associ- 
ations rather than merely recreate mappings recorded in a training 
set (as is the case in the approach based on the formalism of Turing 
Machines), provide a natural framework for the synthesis of informa- 
tion handling when perceptual and/or cognitive interpretations are at- 
tached to the processing steps. This framework is sufficiently abstract 
and universality of processor representation, generally considered the 
sole preserve of Turing Machines -and other automata equated to this 
formalism by Church's thesis (c/, Lewis & Papadimitriou, 1981)- 
cannot easily be denied to neural information processing. (However, 
adequate comparative evaluation of neural networks in relation to Tur- 
ing Machines, in the sense of equivalence, is not yet available.) 
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A representation . implies the existence of two related but 
functionally separate worlds: the represented world and the 

representing world In order to specify a representation 

completely, . one must state: (1) what the represented world 
is; (2) what the representing world is; (3) what aspects of the 
represented world are being modeled; (4) what aspects of the 
representing world are doing the modeling; and (6) what are 
the correspondences between the two worlds. A representation 
is really a representational system that includes all five aspects. 

— Stephen E Palmer 
in Fundamental Aspects of Cognitive Representation, 
Chapter 9 of Cognition and Categorization, 
edited by Eleanor Rosch and Barbara Lloyd, 
Lawrence Erlbaum Associates, Publishers, 
Hillsdale, New Jersey, 1978 
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A study of representation of (signal) processors in isolated neurons is 
essentially a study of mapping the input and output spaces of the pro- 
cessor into those of the isolated neuron and choosing appropriate weight 
and threshold values so as to capture the required association between 
the input and output spaces in terms of the association mechanism 
characteristic of the neuron. The dependency of outputs on inputs in 
isolated neurons is commonly expressed as a nonlinear evaluation of 
a projection of the input; the weight influences the projection and the 
threshold participates in the nonlinear evaluation 

Projections induce a partitioning on the domain and consequently 
processors incorporating projections in their functionality establish 
mappings that associate multiple inputs to the same output value. In 
such a processing situation relative evaluations between outputs (de- 
cisions) don’t, in general, reflect a relative assessment of correspond- 
ing inputs. However, by a suitable restriction of the input space -the 
nature of restriction is not independent of the class of projection oper- 
ators -the desired preservation of relationships between inputs in the 
corresponding outputs is achieved. 

In the following I establish the existence of weights that preserve 
discrete spaces in a one-dimensional (linear) subspace of 5R". These 
weights by mapping functions defined on discrete spaces to sequences 
reduce the problem of learning to one of an enumeration of the weight 
and a problem of search, in a linear order, for the threshold. I also 
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establish that the notion of preservation is independent of the radix of 
numbering and identify, through constructive procedures, with every 
non-null weight in 5ft"" the existence of a discrete subset of 5ft"" which is 
preserved in a one-dimensional linear space 

The approach in this thesis is to provide an analysis that is inde- 
pendent of the interpretative framework and all signals (patterns) will 
thus be considered as being no more than members of vector spaces of 
appropriate dimension Input space preservation, established initially 
on binary vector collections and subsequently extended to more general 
discrete spaces, allows the identification of a subspace that will allow 
a parameterized description of the realized functions. Such an iden- 
tification facilitates an easy characterization of the representation of 
processors in isolated neurons and networks of neurons. 

Input space preservation and identification of certain discrete spaces 
preserved in one-dimensional spaces are considered in § 3.1: this discus- 
sion, though initiated in the context of isolated neurons, is relevant for 
networks of neurons too.^ The implications of preservation on function 
representation, in particular, linear separable dichotomies, is studied 
in § 3.2 (p. 138). Learning and generalization issues, as influenced by 
preservation of input spaces, are studied in § 3.3 (p. 154). Extension of 
the notion of preservation to more general discrete spaces, is taken up 
in § 3.4 (p. 171). 

^Representational issues in layered networks of neurons are considered in Chapters 4 
and 5. 
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3.1 Preservation of Discrete Input Spaces 

Consider the formal model of an isolated neuron in steady state: 

i]{x) = (3 1a) 

y{z) = cr{:yU)) (3.1b) 


In the above model, the input vectors x are considered presented to the 
neuron from a subset of ?? = 1,2, , neuron weights ?/^ are drawn 

from 5?^^ and the nature of outputs y e y is decided by the choice of a 
as described in Chapter 2. I will begin by considering the case wherein 
inputs are presented from = {-1, +1}”, clearly a discrete subset of 
jjn ^hich has the vector 0 G as its centroid (or origin).^ 

For notational convenience the space = {~1,+1} is denoted by 
B, In the literature related to neural networks the collection of 
7i-dimensional binary vectors in 3?”, is commonly interpreted as the 
Boolean hyper-cube of dimensionality n and this term will be used in 
the subsequent discussion.^ Elements in the vectors belonging to B^ 
are related to statements asserting the presence (or absence) of certain 

^All elements of the input vector rr are considered to be compatible and thus only 
regular structures will be investigated While it is not impossible to associate different 
symbol spaces with the different elements of x, such an association would immediately 
violate the symmetry of reasoning and, hence, this asymmetric choice (which would have, 
inevitably, led to irregular discrete symbol spaces) has not been made. Thus, the discrete 
spaces in this thesis will all be based on n-dimensional hyper-cubes, for an appropriate 
value of n rather than the more general situation of parallelepipeds in n dimensions. 
Generality in the characterization, however, is not lost as a result of restricting the 
discussion to regular structures. 

Boolean hyper-cube is the ground set of a Boolean algebra Topologically this set 
incorporates the geometrical features of cubes in three dimensional Euclidean space 
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features of interest in the members of the observation space. The fea- 
tures are, however, given by the specific framework of interpretation 
associated with the input and observation spaces. 

Denote by the one-dimensional linear subspace*^ (of SR'O described 
by a weight w' 

3 ?} 

Note that is isomorphic to 3R, the real line. In Equation 3.1a r; 
involves an evaluation of the scalar product between w and x. As the 
scalar product induces a partitioning of the input space Si” in terms of 
hyper planes^ and the role of rj is to provide an ordering of these hyper 
planes through the (natural) order in for a choice of weight w in the 
neuron, the following is introduced. 


3.1.1 A many-tO’One transformation, say /• Ad An, 
is said to preserve all points of a subset A of Ad, AC Ad, iri Ar if there 
exists a subset, say of Ar, As C Ar, with the following properties. 


i. Uniqueness preservation. All points of As can be put in one-one 
correspondence with A. 


subspace is a subset of a vector space (commonly that is closed with respect to 
the operations of addition and multiplication by a scalar (cf, Ito, 1987). 

®The image (under a translation) of a subspace of a vector space (commonly with 
a one-dimensional quotient space is termed a hyper plane A subset tt C A' is a hyper 
plane in a vector space X over a field K if and only if tt = {x|/(.t) = o?} for cv € and a 
certain non-zero linear functional / 6 A' (cf, ltd, 1987) 
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2, Order preservation. For any partial ordering relation, say de- 
fined on Ad there exists a partial ordering relation on Ajd denoted 
by such that 

Vai,a 2 € A (01,02) € :< = (01,02) € 3 . 

where, ai and 02 are points in A, corresponding to 0 | and 
respectively. 

3. Regularity condition The set As C An is in one-one correspon- 
dence with a set of nationals, say Ar, given by 

^ ^ ^ I y 

Ar ~ {oiiji, O'xjt d" 2, . . . (Xijt d* 2ki I , jt ~ 2 *> 

1=1 

for some at, ki = 1 , 2 , .;i = l,2,. .k;k=l,2,. .. 

The sets Ad and Ar are not, in general, the same and, hence the 
ordering relations related to these sets have been considered different. 
Further, Ad Q and Ar C from Equation 3.1. While in the latter 
set less than or equal to’ (denoted by <) is a natural ordering relation, 
no such natural ordering relation exists for Ad when the dimension- 
ality n is larger than unity. Thus the relation :< is assumed given. A 
preservation of the partial ordering on in terms of a partial ordering 
on Ar is essential to allow relationships between inputs to be preserved 
in relative evaluations of outputs. 

Regularity in the points of As structures the input space A to be 
composed of certain unions of spaces whose image in is a collection 



Section 3 1. Input Space Preservation 


113 


of imiibrmly spaced points. Such a restriction of the input space points 
provides the representational advantage of specifying the set A recur- 
sively from a basic set that is preserved in an appropriate subset of Ar. 
Regularity operating in conjunction with order preservatio]| enforces 
symmetry in the subset A that is put in one-one corresponaence with 
Asj however, this aspect will not be invoked in the present investigation 
in view of the nonlinear nature of the function operating on As . 

As the operation of scalar product (also termed inner product) is 
uniquely identified with weights, input space partitioning induced by 
inner product will be attributed to the weights used in the operation. 
The bilinearity of scalar products (ie, linearity with respect to inputs 
given a weight as well as linearity with respect to weights given an 
input) assures one-one correspondence, order preservation and regu- 
larity in certain discrete subsets of the n-dimensional Euclidean space. 
In the following the existence of operations enabling a preservation of 
discrete spaces will be considered. 

T ^KEO^ReM 3.1.1 There exist weights w in isolated neurons which pre- 
serve^ distinctly, all points of the n dimensional Boolean hyper-cube 
in the one-dimensional space T^for aZZ n, n = 1, 2, — 

An illustration of the discrete space and few of the weights that allow 
a preservation of all points belonging to 13^ in a one-dimensional space 
in the direction of the weight vector are illustrated in Figure 3.1. This 
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illustration explicitly indicates the aspects of one-one correspondence, 
regularity and the natural ordering in the points along the direction of 
the chosen weight. 


As each of the ele- 
ments of the input vector x 

takes on one of two possibil- 


Weight Direction 

V J 

ities, -t-1 and -1, a weight 

(-1,1) (1.1) 

vector in which the n ele- 
ments are assigned unique 
powers of 2 will ensure that 

the inner product w-x maps 


.\'/ i ' 

\/ ' 

'! . 1 
, " / \ • 1 

/ \ 

the 2" distinct points of B" 


' /**•. ^ ^ 

/ 1 \ 

to distinct points (numbers) 

in SR. (The weight directions 

shown in Figure 3. 1 conform 

(-1,-1) • 

i 

T— f 

/ 1 

•’ A"' 

1 

1 

1 

1 

1 

1 

to this assignment.) For ex- 
ample, a weight chosen as 

Figure 3.1: Illustration of 


Wi = i = 1, 2, ... n, is a good candidate for establishing a one-one 
correspondence between S” and a discrete subset (of 2” points) in 


It is immediately apparent that when B", for all n, is interpreted as 
a poset with the help of a partial ordering relation, say ■<, this partial 
ordering is preserved in the corresponding points in C^, ie, 
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where, < is the relation less than or equal to,’ also known as ’not 
greater than ’ This preservation of partial ordering is a consequence of 
the axioms of inner product operation. 

Weights given by the assignment Wt = ±2*“^ i = 1,2,. ri, satisfy 
the regularity condition required for preservance. 

□ 

The above theorem suggests multiple weights that accommodate a 
preservation of the same discrete space. In order to facilitate a char- 
acterization of these weights, denote by the discrete space ob- 

tained by scaling and translating the Boolean h 3 q)er-cube : 

X . { — 4- tDti, +C + 

where, C ^ is the scale factor and ^ 'i? 2 , • • • ^ 3^'' is 

the translation applied to all points of B”. (Note that B'' = 
and B = B^(1,0).) Weights accommodating a preservation of points 
belonging to B'* are described as in the following. 

T:>C£ 05 REM 3.1.2 Weights w that preserve B’^ in are given by the 
assignment 

n 

lUi £ B(o;2-^"’\ 0) , O; 6 3?+, i = 1, 2, . (3.2) 

subject to the restriction that ^ for aZZ ?, A’ = 1, 2, . . v, i ^ A:, 
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yatoo?: The property that every non-null weight vector preserves par- 
tial ordering under inner products, invoked in the proof of the earlier 
statement, necessitates only a one-one correspondence between S” and 
a discrete subset of to be established. Since weight assignment is 
in accordance with the prescription given in the earlier statement, the 
one-one correspondence too is readily evident 

Regularity in the collection of points in Cw which is put in one-one 
correspondence with U” is not affected by scaling and translation as 
both operations apply uniformly to all points in S”. 

□ 

Theorem 3 1.2 suggests the space of possible assignments up to a 
common positive scale factor; this scale factor has been denoted by a 
As indicated in Figure 3.1 (p. 114) preservance is affected by the direc- 
tion of the chosen weight w, the common scale factor- this influences 
the norm -serving to control the separation between adjacent points 
corresponding to the image of in £„. Preservance of points be- 
longing to the discrete space in £„ for a weight w shows directional 
dependence as the uniqueness in the images of points in U" under inner 
product is directionally dependent 

It is easy to accept preservance of 13" in when the elements of w 
are given by u;, = 2*“*, t = 1,2,.. n. The following examples illustrate 
the nature of preservation of B" for other choices of w governed by the 
assignment in Theorem 3.1.2. In these examples the common scale 
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factor Q' has been retained as a variable to highlight the dependence 
of preservance only on the direction of the weight vector and not the 
scale factor. For convenience, the examples are restricted to binary 
collections in Euclidean spaces whose dimensionality is small. 


Example 1: Case n = 3, iwi = 

a, W2 = - 

2a, 7^3 = 4a 


X -1-1-1 

-1-1+1 

-1+1-1 

-1+1+1 

w X —3a 

— la 

-la 

—5a 

X +1-1-1 

+ 1-1+1 

+1+1-1 

+ 1+1+1 

w X -f 5a 

+7a 

+la 

+3a 

Example 2: Case n = 3, wi = 

-4a, W 2 = 

: la, 703 = - 

2a. 

X -1-1-1 

-1-1+1 

-1+1-1 

—1+1 +1 

w T +5a 

+ la 

+7a 

4 3a 

T +1-1-1 

+ 1-1+1 

+ 1+1 — 1 

+1+1+1 

w>x —3a 

-la 

-la 

—5a 

Example 3: Case n = 4, wi 

-8a, W 2 = 

: la, 703 == 2a, 704 = 4a. 

X -l-l-l-l -1 

-1-1+1 

-1 -1+1-1 

-l-l+l+l 

wx 4-Ola 

+09a 

+05a 

+ 13a 

X -l+l-l-l -l+l-l+l 

-l+l+l-l 

-l+l+l+l 

w X 4-03a 

+lla 

+07a 

+ 15a 

X +1-1-1-1 +1 

-1-1+1 

+1-1+1 -1 

+ 1-1+1+1 

W'X —15a 

-07a 

-llo 

-03a 

X +1+1-1-1 +1+1-1+1 

+ 1+1+1— 1 

+1+1+1+1 

W’X —13a 

—05a 

—09a 

-Ola 
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Without the necessity of a proof, the above examples allow the fol- 
lowing to be easily appreciated 

3 . 1.1 Given a weight w governed by the assignment 
in Theorem 3.1.2 (with a E such that the Boolean hyper-cube B”- 
IS preserved in C^, the points in the linear sub-space which are in 
one-one correspondence with the 2" vertices of B" are 

1. identified on the basis of the numerical value of the binary repre- 
sentation as decided by the choice of weights, 

2. equidistant from adjacent members (implication of regularity) and 

3. restricted to the interval [-a2” - 1, -fQ;2" — ij C 5ft 

The positive scale factor a serves to enrich the space of weights 
that accommodate a preservation of B”. In the following I will denote 
the collection of weights w that preserve in by the suffix n 
indicating the dimensionality of the input space over which the weights 
are applicable. For convenience of analysis P„ (a) will be used to denote 
the restriction® of weights given a. The set p,, (a), for any a E 5lt+ 
contains weights of identical norm, the norm being a function of a. 
Weights w that preserve all points of B” in are specified up to a ^ 
common scale factor by the following. 

®It is not incorrect to state that Pn (a) is the coset (1 ) of P„ ( 1 ) in P. 
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6 o:JWj 0 x:A:R}f rro T:>Cf:oyi£.7vf 3.1.2 T/ie entire collection of weights w that 
preserve^ in given any a G 9?^ is expressed as in the following. 


Pnia) = 


Ue(a2'-\0) 


\tzzl 


\ 


U U U (e'(a 2 ^-',QJ X )), 

t=l j = l k=:l 

where, 0^ is the zero (origin) ofR^, j = 1,2, . , and A^, the 0 -fold 
Cartesian product of any space A, is 0, the empty set. 


Note that the set 

U [J [J (S*(a2^-\0JxS"-‘(a2'=-\0— )) 

r=l j=l k-l 

describes weights which have at least two elements with identical mag- 
nitude and these are excluded from 

(.Ui?(a2'-i,0))", 

the space of possible weights. VC e € 3R.) In 

the above statement the space of possible weights and the collection of 
weights that have two or more elements with identical magnitude are 
expressed in terms of scaled Boolean h 5 Tper-cubes. 

y 3 i 03 > 0 S 3 T 30 K 3 . 1.2 The number of preservance weights for an input 
space of n-dimensions, n = 1, 2 , . . is given for any a by 

|Pn (a) I = n! 2” 

^Note that as the weight tu changes, the one-dimensional space /iju, by virtue of being 
the linear sub-space of in the direction of w, also changes conespondingly 
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The above two statements are immediate consequences of Theo- 
rem 3.1.2 (p. 115) and, hence, proofs are not required. Proposition 3.1.2 
suggests the number of distinct directions along which weight vectors 
can be chosen to accommodate a preservation of points belonging to 
B". Noting that weights admitting preservance of a Boolean hyper- 
cube belong to scaled Boolean hyper-cubes, it is of interest to know 
the possibility of preserving the scaled and translated Boolean hyper- 
cubes B”(C, J?) for appropriate values of ( and ■0. From the definition of 
Boolean hyper-cubes, the following is evident. 

T3ioa>os30-3ON 3.1.3 Weights w that preserve the n-dimensional Boolean 
hyper-cube B" in also preserve B"(^, i?) C S 'Q. € 3?". 

I 

/ 

It is simple to observe that the scale factor ( does not alter the 
preservation property as long as ( is non-null. Similarly, a translation 
of the origin by :i9 adds the component ?£ f to the inner product and as 
this addition applies, uniformly, to all nodes of 5”, preservation effected 
by a weight is unaltered due to scaling and (origin) translation of the 
input space 

0 

Though regularity is unaffected by scaling, the separation between 
adjacent points forming images of points belonging to (C, i?) increases 
as the positive scale factor (. In contrast, translation does not alter the 
images relative to each other. Note that as a consequence of scaling 
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and translation, the points in which preserve the points of S” (C, i), 
uiulct a wt'ipiht w f’ovoi nt'd by (ho nssi((nimMit in d'hooroin .') 1 .2 (j). 1 1 5), 
are limited to the interval 

[—0:^2” ^ + tP j2) +cK,’2" * + i£ ii] = [” +1] + w 0. 

These bounds reflect scaling in the weights as well as inputs. 

In the following, the points in that are put in one-one corre- 
spondence with those of 6”(C,29) given C € 5R+ and i2 G 5R" will be 
denoted by where a G 5R+ is the scale factor associ- 

ated with the weights involved in the preservation of (C, j?) in C^. 
(Note that = |e"(C,f)l - 2" for all a, C e 5R+, e 

r; = 1, 2, . ) The set i2^(n, IT' (C, iZ)) consists of points (vectois) drawn 

from a one-dimensional space of 5ft" described by the weight w and 
w (a, B’' (C, i2))> a collection of scalar products, is a discrete subset of 
5ft in one-one correspondence with {1, 2 , . . , 2’*}. 

Preservance weights, ie, weights w G P„(a), a G 5ft+, which establish 
apreservance of in (through points in £„(a,fJ"(C,:d)) C jCw), 

will in the following be enumerated with the notation w^c> where c de- 
notes the enumerator index. Figure 3.2 (p. 122) indicates an enumera- 
tion scheme® for the preservance weights of the collection of scaled and 
translated binary vectors. As an example, all the preservance weights 
for an input space of dimensionality n = 3 (a chosen to be unity) are 
enumerated in Table 3.1 ip. 123). In the enumeration suggested above. 


*'The enumeration scJiemo is given in a pseudo code 
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for c = 0, 1, 2, . 71^ ” 1 given n, n = 1, 2, . ; and a e 9?+ 

do 

1 Evaluate and c, = emod2”, where [ J and mod 

are the floor and modulo operations, respectively. 

2 Construct iD = { lO, = 2^“^ |? = 1,2, n}, a set in which 

the elements are put in ascending order. 

3. Assign to i the value 1 and to e* the value Cp 

4 Assign to tui the value where jz = J and is 

the jz 4- 1 th element of t£) 

5 Evaluate Ct+i = Ci -* (n — if jx and construct the ordered set 
z+iD = ,£) \ { tOj^ }, where the elements are in ascending 
order. 

6. Assign to z' the value r, advance % by 1 and repeat steps 4 and 
5 till f is not more than n 

7 Assign to numbers sk, k = 1,2, .n, values +1 or -1 such 

that Yj = 2e, - (2" - 1) is the 

A! = 1 

binary representation - with symbols -f 1 and —1 - of the deci- 
mal number c,.) 

8 Assign to the direct product of vectors s scaled by -cv 

and = [wi, ' 1 X 12 ,. ie, a;<c>fc = = 1, 2, n, 

done 

Figure 3.2: Scheme for enumerating preservance weights of 

= e and^e 
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Table 3,1: Preservance weights forn = 3 with a = 1 


:^<o> ” + 2 4* 4] 

H<2> = [ + 1 “*2 4- 4] 
H<4> =[-H 4-2 -4) 
H<6> =[4-1 -2 --4] 
y^<8> - [+1 4* 4 4- 2] 
H<io> ~ ["tl -4 4-2] 
H<i 2> = [4-1 4-4 - 2] 
l^i<i4> = [ + 1 - 4 - 2] 
yi<i6> - 4-14-4] 

^<IB> = [+2 - 1 4-4] 

m<2o> = [ 1 “-'U 

yi<22> ” ["t2 — 1 -- 4] 
y^<24> = [+2 4-4 4-1] 
1^<26> = [+2 - 4 4-1] 
H<28> = [+2 4-4-1] 
y^<3Q> = [4-2 - 4 - 1] 
l^i<32> = [4-4 4-1 4-2] 
1^<34> = [+'^ -14-2] 
1^<36> = [+^ +1 ~ 2] 
H< 38 > =[4-4 - 1 - 2] 
lii<40> = [^"^ +2 4-1] 
^<42> = [-f"4 - 2 4-1] 
H<44> =[4-4 4-2 - 1] 
1£<46> = [4-4 - 2 - 1] 


= [“1 4-2 4-4] 
111<3> =[-l ~2 4-4] 
111<5> =[-1+2 -4] 
y^<7> = [-1 - 2 - 4] 
H<9> = [-1 +4 4-2] 
yi<u> = [■”! -4 4-2] 
H<i 3> =[-l +< -2] 
31i<i5> = [— 1 — 4 — 2] 
" ["“^ 4-14-4] 
H<io> = [—2 -14-4] 
lii<2i> = [-2 + 1 -4] 

^<23> ~ (””2 — 1 — 4] 
M<25> = [-2 4-4 4- 1] 
yi<27> =[-2-44-1] 
^<29> =[-24-4-1] 
H<31> = [-2 - 4 - 1] 
M<33> = [““4 4-1 4-2] 
^<Z5> =[-4-14-2] 
M<37> =[-4 4-1-2] 
1^1<39> =[~4 - 1 -2] 
1£<41> =[-4 4-2 4- 1] 
~<43> ~ 2 4- 1 ] 

3£<45> = [-4 4- 2 -- 1] 
111<47> ^ [ “~4 — 2 — 1 ] 
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the preservance weight i£<o> for all dimensions (ii) and n = 1 refers to 
the weight used in a positional representation of decimal numbers as ii 
bit binary numbers. 

Weights that preserve all points of the Boolean hyper-cube (C, 

71 = 1,2, ; ( e and G along a one-dimensional linear sub- 

space (described by the weight) define di generalized positional number- 
ing system. In Figure 3.1 (p. 114) the directions of weight vectors have 
been shown to have uniform angular spacing* this property though 
depicted in the case of two-dimensional vector collections is assured 
in higher dimensions by the corollary to Theorem 3.1.2. Seeking a 
structure to the collection of preservance weights, the following hold. 

3.1.4 Preservation of the Boolean hyper-cuhe 
C ^ ^ 3?”, under a weight 2g<c> ^ ^n(u) is equivalent to preserva- 

tion under a permuted version ofw^Q^ G Pn{ot), the permutation being 
given byP,a = where, 

f Sff7l(Wct>,) ifj = l+ 1092 

“1 , . 

0 otherwise. 

y?iooy: Without any loss of generality, the statement will be estab- 
lished with Of = 1. Let the elements be expressed as 

W<t>, = 5,2°', = ±1, O, = 0, 1, ..71-1,1 = 1,2, 7?, 

in accordance with the prescription given in Theorem 3.1.1 ip. 113). 
Noting that 7i7<o>. = by choice and that |i«<,>.t ^ |w<c>J, i i=- J, 
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i,; = 1, 2, . n, the assignment suggested for the elements immedi- 
ately follows on recognizing that o. = log^ |'u;<c>,| ands, = sgn{tu<t>,)- 

□ 


CoOlOCCAOiy TO T'JIOTOSOTOOT^ 3.1.4 

a. The permutation to UL<o> is given by Poc = pJq, where, 

denotes the transpose of a matrix A. 

V 

h. The permutation to W.<t 2 > is given by Pt^t^ = Pt^oPozx- 

This statement is obvious and, hence, no proof is provided. In view 
of Proposition 3.1.4 (p. 124) and its corollary, a characterization of pro- 
cessor representation in isolated neurons does not lose generality when 
t the weights are chosen to be in any of the finitely many directions 
suggested by the corollary to Theorem 3.1.2 (p. 115). Proposition 3 1.3 
(p. 120) indicates that the preservance weight of a scaled and translated 
Boolean hyper-cube belongs to a scaled Boolean hyper-cube, though as 
indicated in Theorem 3.1.2 {p. 115), not all points of a (scaled) Boolean 
hyper-cube are valid as preservance weights of Boolean hyper-cubes. 

It is of interest to know the possibility of using valid members of 
the scaled and translated Boolean hyper-cube® ) as the preser- 

vance weights of B’'{( 2 x'i. 2 )) ^ Boolean hyper-cube with a scale factor 

®Note that only a few members of any scaled and translated Booleatj hyper-cubo 
contribute to the collection of preservance weight S'^n . 
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and translation allowed to be distinct from that of the Boolean hyper- 
cube from which the preservance weights are chosen. Keferring to 
Equation 3.1a (p 110) the evaluation of ?; at an input x under a valid 
preservance weight w e the preservance weight is assumed, 

for simplicity, to be described as w = Ci2't<o> + ^i> ^<o> ^ Pn(l)-is 
given as ri{x) = Ci w<o> £ + ^ 

Theorem 3,1 2 (p. 115) shows that preservance is unaffected by scal- 
ing in the weights, the scale factor is assumed positive. As a result only 
the effect of superposition of weight vectors on preservance remains to 
be studied. The structure of p for weights in (Ci , ) suggests two pos- 
sibilities: (a) the composition of preservance weights through weights 
that are not themselves valid as preservance weights and (b) the de- 
composition of preservance weights in terms of preservance weights. 
Of these only the latter situation is of interest as the former is trivially 
satisfied by the structure of vector spaces. The following statement 
establishes a characterization of preservance weights. 

3.1.3 Given two preservance weights of the discrete space 
C G 9 ?+, 1? G 5 ft”, Wi G P„(ai) and G Pn(n2), 0’i,a2 G 5 ft+, a 
weight w given as w = Wi +W.2^s preservance weight ofB'^ i?) if and 
only if 

K W2I ^ , 
fallllMsIl ' 

with I3 o 2 ifw2 — -02Vi. 
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‘.P'KOO'r If |/i», ‘//'ol is o(|ual 

" 



to IliOl II ||wi II then W2 G 

Weight Directic 

)n / 

which implies that w G . 

As a consequence of Theo- 

(-1,1) f -.-..-.-7 

i 

JiL (1^1) 

rem 3.1.2 (p. 115) all mem- 
bers of the one-dimensional 

/ A^Preseijvance weights 

subspace (of \ {0} 

I 

1 

are preservance weights of 

: 

1 

1 

the discrete set i3"(C,^) if 

: / 

1 

1 

the weight is a preser- 

(-1.-1) - j: 

(1,-1) 

vance weight of 

Qi # /^Q '2 if 102 = en- 

/ 

Figure 3.3: Superposition of preser- 

sures that the composition 

w is not a null vector. This 

vance weights of 

establishes the ’if’ part. 




Figure 3 3 illustrates a weight vector composed from proscrvance 
weights that do not satisfy the requirement Imi-jiial = llillill Iliilill- 
such a superposition, either one-one correspondence or regularity in 
the images of are not exhibited and, as a consequence, the 

composition w fails to be a preservance weight for This ob- 

servation counters the negation of the ’only if’ part and the resulting 
contradiction establishes the necessity of the constituent preservance 
weights in a composition to be in the same ’direction.’ 


□ 
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This theorem states that a superposition of the preservance weights 
of is a preservance weight of -8”(C,22) if and only if the con- 

stituent preservance weights differ from each other only in scale. As a 
consequence, preservance weights of the discrete space (Ci22) cannot 
be selected as basis vectors of Pn- (Given that the finite number of 
directions in which preservance weights can be chosen is exponentially 
dependent on the dimensionality 7i and the structure of preservance 
weights as given in the corollary to Theorem 3.12 (p. 115) it is not diffi- 
cult to find a collection of ii linearly independent preservance weights.) 

As established in the preceding discussion, the operation of inner 
product employing preservance weights induces a one-one correspon- 
dence between the 2^ vertices (ie, points) of and the 2^^ uni- 
formly spaced points in the one-dimensional sub-space 2 Z)) C 

for any a € 5R+. Under this operation, it is easy to see that every 
point in is identified with a distinct h 3 q)er-plane in . Preserva- 
tion of points in an input space then amounts to an identification of 
specific hyper-planes and choice of a specific point in each hyper-plane. 
These specific points are now identified with distinct points in the one- 
dimensional sub-space identified by the preservance weight. 

The discrete space B^ is a collection of input vectors whose elements 
are commonly interpreted as an assertion of the presence, or absence, 
of features related to observations that are subjected to inferencing (In 
this sense McCulloch & Pitts (1943) and subsequent investigators 
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have identified neuron inputs with propositions and the operation of 
neurons (as well as neural networks) with formulae of the propositional 
calculus ) A generalization of C ((, 2 ?) to discrete subsets of 9?"' that are 
chosen to have a regular structure and are in one-one correspondence 
with discrete subsets of for a given preservance weight w is provided 
in the following. 

Consider the construction for a given n, n = 1, 2, . . , 

r 

1=1 

where, r = 1, 2, , 1 ? € and the coefficients Ci and ^ are 

relatively (mutually) prime for all 2 , j, i # i, / , j = 1, 2, . . , r. This space 
is constructed as a union of scaled Boolean hyper-cubes with a common 
translation taking care that the scale factors do not force multiple points 
to have the same image in for a preservance weight m corresponding 
to the constituent discrete sets (0) 22)- is of interest now to seek the 
preservance weights of (C, 2 ?) based on the available knowledge about 
the preservance weights of scaled and translated Boolean hyper-cubes. 

3.1.5 Weights wGPn establish distinct images, in C^, 
of all points of the discrete space <S^ (C, :^) for all n = 1,2, . ; r = 1, 2, . . 

C 6 5R+ and 2 ? € 

T^ioosf: Every Boolean hyper-cube B'^(Ci,22.) C *S,?(C,2?), C ^ 3?+ is 
preserved in as indicated in Proposition 3.1.3 ip. 120). When these 
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hyper-cubes are scaled using coefficients that are mutually prime, the 
preservation points in corresponding to the different hyper-cubes are 
distinct and, hence, distinctness of the images in the union is assured. 
On the other hand, if the coefficients corresponding to different hyper- 
cubes are not relatively prime, the required one-one correspondence 
bet'ween<S’"((,^) and any discrete subset of cannot be ensured. 

□ 


Figure 3.4 provides an illustration of 
the discrete space 51(1,0) w^ith an ac- 
companying diagram of points in 
that are in one-one correspondence with 
the points in the discrete space. As evi- 
dent from this illustration, preservation 
points in £„ (for a preservance weight w) 
are, in general, irregularly spaced, the 
spacing increasing as the distance from 
the preservation point corresponding to 
the centroid of 5" (C, ‘0). The lack of uniformity in spacing between the 
points identified in corresponding to points in 5" (C, £) given the scal- 
ing coefficients (^,,i = 1,2, . , r, precludes any further consideration of 

preservation of the discrete space 5"(C,;^). However, the approach of 
identifying points in with a union of scaled and (origin) translated 
^rersions of the basic Boolean hyper-cube 5”(1, 0) is useful as indicated 
in the following. 



Weight Vector Direction 

X 


^ f ^ 

'■ T.-r. 

' ' i • 


♦ v:f: 

" # : 


► ♦ 

Figure 3.4: 51(1,0) 
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Let 7^”(C,22) denote the recursive construction 



= 0, 

(3.3a) 

K(C,^) 

= ^"(C.22) 

(3 3b) 



> 


r = 2, 3, . . . , 

(3.3c) 


where, \ t = 1, 2 , . . . , 2"’’, are the 2"'' (ordered) points of the discrete 
subset7?;Li(C,^)\^"-2(C,22)of5i",n = 1,2, .;r = l,2, .;Ce5ii+and 

^ G 3?". This discrete set differs from {(,•&) in that the constituent 
Boolean hyper-cubes are scaled and translated, the scale factors and 
translations being related in powers of 2. Seeking the preservance of 
the following hold. 

Tiroipossti/ox 3.1.6 Weights w € Pn also preserve, in all points of 
= ,r = l,2, ..,Ce3?+, anc(3i6?R'‘. 

T3ioo3f: For any weight w 6 Pn every scaled and translated Boolean 
hyper-cube +3?) C VriCt&t for all values of r, r = 

1 , 2, . . . , is preserved in as established in Proposition 3.1.3 (p. 120). 
Thus preservance of P" (C, £) is ensured. At any step r, r = 2 , 3, . . . , the 
images in of points in the union of Boolean hyper-cubes that specify 
points in addition to those accumulated at the end of step r - 1 are uni- 
formly spaced, with the spacing between adjacent points being given 
as 2~” times the smallest spacing between images in of points in 
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■Pr-i (Cl ’2)- (See Figure 3 5 (p. 134) for a clarification of the construction 
when n = 2 and r = 3.) As the translations in step r are given by the 
points added in step r - 1, the collection of uniformly spaced images in 
are distinct in all the steps, thereby satisfying one-one correspon- 
dence and regularity. Order preservation, based on the preservance of 
■0" (Cl 12), follows from the fact that the set V" (C, 32) and its image in 
are composed of non-overlapping sets, the constituent sets being, re- 
spectively, scaled and translated Boolean hyper-cubes and their images 
in Cw- 

□ 

3.1.7 The centroid is identical to that of 

B"(Ci32)fi= li2, 7-, r = 1,2,. , C G 5R+, and 3? G %’■ 

Proposition 3.1.7 is obvious. In view of Proposition 3.1.6 P„, the 
space of preservance weights, will be referred to, in the sequel, as the 
mllection of weights that preserve 'P"(Ci32) in The collection of 
Doints in £„ that are put in one-one correspondence with 'P"(C,32) by 
he preservance weights will be denoted by (a, Vf (C, 32))- (Clearly, 
:u,(aiP"(Cii)) D £ 3 ,(a,B"(C,:d)), for all a,C G 3?+, 32 G and the 
lontainment is proper when r = 2, 3, . . ) Note that as a consequence 
if the manner in which the space P"(C,^) is constructed, the interval 
letween any two adjacent preservation points of P"_i(C,32) in is 
lub-divided to accommodate 2" additional preservation points. This 
imounts to a ranking (denoted by r) of the preservation points in C^. 
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An illustration of the discrete space 'P 3 ( 1 , 0 ) with an accompanying 
diagram of points in that are in isomorphism with the points in 

the discrete space is provided in Figure 3.5: the points of that be- 
long to P 3 ( 1 , 0 ) and are in one-one correspondence (denoted by with 
those of (1,P^(1,0)) (corresponding to the preservance weight 
'w;<o> with a = 1 ) are indicated in Table 3.2 (p. 135). From the illustra- 
tion, regularity of the discrete space and uniformity of spacing between 
preservation points in are immediately evident. The following char- 
acteristics of are noteworthy. 

3.1.8 The number of distinct points included in 
the discrete space Pr{(i,'d) is given by 


subject to the understanding that |7^^(C)2i)| = 0 and |'P"(C)^)| = 2 ”, 
71 = 1,2,..., C € 5R+, and'O^W^. 

CPaiooa^: For all n, n = 1 , 2 ,..., the construction indicated in Equa- 
tion 3.3 ip. 131) suggests that Pi"(C,#) is a Boolean hyper-cube and 
thus |Pr(C,f)| = 2 ” for all the admissible values of C and d. The space 
^2 (Cl]?) is obtained by identifying with every vertex of P"(Ci 2 i) a dis- 
crete subspace of 5 ?" isomorphic to P" (C, d) and this, on taking unions 
of the discrete subspaces, establishes that |P^(Ci 3 ?)l = 2"|P"(C,d)| + 
|Pf(C,d)| = 22" -t- 2". 
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Figure 3.5: Illustration of 'P|(1.0) 
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Table 3 . 2 : Points of (1,0) ^ .. I - (1, 7 ^ 1 ( 1 , 0)) 


(-1 3125,-1 3125) -- -3 9375 
(-1 1876,-1 3125) -V -3 6875 
(-1 3125,-0 6875) ^ -3 3125 
(-1 1875,-0 6875) -3 0625 

(-0 8125,-1 1875) ^ -2.8125 
(-0 6875,-1 1875) ^ -2.5625 
(-0 75,-0 75) ^ -2 25 

(-13125,0 6875) ^-1 9375 
(-1 1875,0 6875) -1 6875 

(-13125,1 3125) ^-13125 
(-1 1875,1 3125) ^-1 0625 
(-0 8125,0 8125) -^-0 8125 
(-0,6875,0 8125) ^ -0 5625 
(-0 75,1 25) -0 25 

(0 6875,-1 3125) ^ 0 0625 
(0 8125,-1 3125) ^ 0.3125 
(0 6875,-0 6875) ^ 0 6875 
(0 8125,-0 6875) ^ 0 9375 
(1 1875,-1.1875) ^ 1 1875 
(13125,-1.1875)^ 1 4375 
(1 25,-0 75) 175 

(0 6875,0 6875) 2 0625 

(0.8125,0.6875) ^ 2 3125 
(0 6875,1 3125) 4-. 2 6875 
(0 8125,1 3125) 4-4 2 9375 
(1 1875,0 8125) 3 1875 

(1 3125,0 8125) 4-4 3 4375 
(125,1.25) 4-* 3 75 


(-1 3125,-1 1875) 4.^ -3 8125 
(-1.1875,-1 1875) 4-4 -3 5625 
(-1 25,-0 75) 4-> .3 25 

(-1,-1) -3 

(-0 75,-1 25) 4 -^ -2 75 

(-0 8125,-0 8125) 4-4 -2 4375 
(-0 6875,-0 8125) 4-4 -2 1875 
(-1 3125,0 8125) 4-4 -18125 
(-1.1875,0 8125) 4-. -15625 
(-1 25,1 25) 4-4 -1 25 

(-1,1) 4^ -1 

(-0.75.0 75) 4.^ -0 75 

(-0 8125,1 1875) 4-4 -0 4375 
(-0 6875,1 1875) 4-4 -0 1875 
(0 6875,-1 1875) 4-4 0 1875 
(0 8125,-1.1875)^ 0 4375 
(0 75,-0 75) 4-4 0 75 

(l.-l) ^ 1 

(1 25,-1 25) 4-4 1 25 

(1 1875,-0 8125) 4-4 16625 
(1,3125.-0 8125) --4 18125 
(0 6875,0 8125) 4^ 2 1875 
(0 8125,0 8125) 4-4 2.4375 
(0.75.125) 4-4 2 75 

(1.1) 4-. 3 

(1 25,0 75) 4-4 3.25 

(1.1875.1.1875) ^ 3 5625 

(13125.1.1875) ^ 3.8125 


(-1 25,-1 25) ^ -3 75 

(-1 3125,-0 8125) 4-4 -3 4375 
(-1 1875,-0 8125) 4-4 -3 1875 
(-0 8125,-1 3125) 4-4 -2 9375 
(-0 6875,-1 3125) 4-4 -2 6875 
(-0 8125,-0 6875) 4-4 -2 3125 
(-0 6875,-0 6875) 4-4 -2 0625 

(-1 25,0 75) 4^ -1 75 

(-1 3126,1 1875) 4-4 -1 4376 
(-1 1875,1 1875) 4-4 .1 1876 
(-0 8125,0 6875) 4-4 -0 9375 
(-0 6875,0 6875) 4-4 -0.6875 
(-0 8125,1 3125) -4-0 3125 
(-0 6875,1 3125) -4 -0 0625 

(0 75,-1 25) -4 0 25 

(0 6875,-0 8126) 4-4 0 5625 
(0 8125,-0 8125) 4-4 0 8125 
(1 1875,-1 3125) 4-4 1 0625 
(13125,-1^125) -4 13125 
(1 1876,-0 6875) 4-4 1 6875 
(1 3125,-0 6876) 4-4 1 9375 

(0 75,0 75) 4-4 2 25 

(0 6875,1 1876) 4-4 2 5625 
(0 8125,1 1875) 4.4 2 8125 
(1 1875,0 6876) 4-4 3 0626 
(1 3125,0 6876) 4-4 3 3125 
(1 1875,13125) 4-4 3 6875 
(1.3125,1 3125) 4-4 3 9375 
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In the construction of spaces r = 3,4, , association of 

^r(C.t2) is, however, not with respect to every point of ((, 2 ?) but 
only with the points in 7^”-! (C, >2) ^Z), points of (C,:^) 

not contained in P "_2 (C, t)- This establishes the recursive relationship 
in the cardinality of the spaces P”(C,^), '' = 2,3, , which gets ex- 

tended to the case when i = 1 with the assumption that |Po*(C,22)| = 0. 
(This assumption is justified as Pq (C.32) is defined to be the empty set ) 
On carrying out the suggested recursion, the cardinality of P"(C,22) is 
obtained as the geometric series: 

r 

ipr(c,32)i = E2'” 

t=i 

It is then simple to see that \V^ (Ci2Z)l is indeed given by — -— ' i — 

□ 


Note that this result follows directly from the definition as in the 
rth step of the recursion, points are being added. However, a more 
detailed argument has been provided in the proof to help a clarification 
of the construction of the discrete space Table 3.3 lists the 

value of for a few small values of n and r. Continuing the 

recursion to the limit, the following emerges. 

Tiro!POSJt:jon3.1.9 C 5?'", n = 1,2,. r = 1,2, . . 

and d G is a discrete space containing points sampled from open 
balls centered at the vertices of the Boolean hyper<ube The 

radius of each of the distinct balls is given by lim . 
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Table 3.3: Cardinality of (1,0) 


II ” 

r= 1 

r = 2 

r = 3 

r = 4 

r = 5 

1 

2 

6 

14 

30 

62 

2 

4 

20 

84 

340 

1364 

3 

8 

72 

584 

4680 

37,448 

4 

16 

272 

4368 

69,904 

1,118,480 

5 

32 

1056 

33,824 

1.082,400 

34,636,832 

6 

64 

4160 

266,304 

17,043,520 

1.090,785,344 

7 

128 

16,512 

2,113,664 

270,549,120 

34,630,287,488 

8 

256 

65,792 

16,843,008 

4,311,810,304 

1,103,823,438,080 

9 

512 

262,656 

134,480,384 

68,853,957,120 

35,253,226,045,952 

10 

1024 

1,049,600 

1,074,791,424 

1,100,586,419.200 

1,127,000.493,261,824 


3,1.10 The collection of preservation points of the dis- 
crete space t?), n = 1, 2 , . . , C ^ and ^ € 5?’^, as r and tend to 
oo, given a preservance weight w, is dense in C^, 


This latter statement implies that the collection of hyper-planes that 
are identified by the preservation points of given a preservance 
weight w, is dense in the input space 3?^. For any finite positive value 
of C, however, only finitely many preservation points exist in a bounded 
interval of for a preservance weight w. As indicated in Table 3.3 the 
number of points in grows faster (with the dimensionality n) 

than those in = Pf (Ci3?) ^d, thereby, a comparatively larger 

number of functions are realized even with small values of r: in this 
comparison the functions are assumed to have discrete outputs. 
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3.2 Function Representation in Isoiated Neurons 

Existence of weights w that preserve all points of iP”{C'32) in for 
the admissible values of n, r, (, and j2 ensures that functions defined 
overV^{C,i.) through the operation introduced in Equation 3.1 (p 110) 
can be appreciated as univariate functions when preservance weights 
are employed in evaluating the projections of input patterns x. While 
it is trivial to note that the operation of inner product maps 3?'* to 5?, 
selection of preservance weights enables a parameterized description 
of the (discrete) input space; the projection of input points along the 
preservance weight is used as the parameter. 

In order that the equivalent description of functions over the dis- 
crete space 7^"(C,d) be understood it is important to note that func- 
tions over 'P"(C,:d) when realized through isolated neurons incorpo- 
rating a weight w in Pnicf), cv € 3t+, are equivalently realized as 
functions over £w(Q;,'P"(C,i))- Note that £ 2 r(n,P"(C- 2 Z)) is a one- 
dimensional discrete subset of and contains vectors which are in 
Cw- However, the collection of normalized pro- 

jections along the preservance weight is in one-one correspondence with 
{1, 2, . . |■P"(C,^)|}• These observations are formalized below. 

(Pkoj’OSOtjon 3.2.1 Functions defined over the discrete space P" (C, 22). 
n = 1, 2 , . . .; r = 1, 2 , . . .; ^ € ^ are equivalent, under inner 

product employing a preservance weight w € Pn(n), for any a € 3?+, to 
(finite length) sequences over the discrete index set £„{q;, Vr{C,d)). 
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eo:RO£-i:yiaiy to !Pai0T0snT50K3.2.1 Boolean functions, ie, binary func- 
tions over n = 1,2,. ^ under inner product incorporating a 

preservance weight w € Pn (<t), for any a € are equivalent to (finite 
length) binary sequences over the discrete index set £^(a, 'P" (C, 3?))- 

3*2,2 Decision functions evaluated by neurons with 
preservance weights wEPn are 

1, piece-wise constant (bivalent) sequences over C 5R” if the acti- 
vation function a is hard-limiting and 

2, continuous and piece-wise monotonic sequences over if a is 
sigmoidal, 

TrK£03^£M 3.2.1 The number of distinct k-ary functions that can be 
defined on is 

fcTOC,i)l = p"(2'--i)/(2'‘-i)^ ^ 1^2, ..;C € 3?+ andle^^ 

and tends to as n increases towards oo. 

eo!RO/:/:yi:Ry to Jtc£o:r£M 3.2.1 The number of distinct binary valued 
functions that can be defined on is 22 ”( 2 ’'''-”i)/( 2 "-i) 

A simple application of combinatorial arguments using the cardi- 
nality of P^(C,l?) (see Proposition 3.1.8 (p. 133)) is sufficient to es- 
tablish the above theorem and its corollary. Though a discussion on 
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Figure 3.6: Representation of bipolar bivalent functions on T’f (1, 0) 


the nature of functions represented in isolated neurons can effortlessly 
be carried out on multi-valued discrete functions -in particular when 
preservance weights are incorporated in Equation 3.1a (p. 110)-only 
the specific case of bivalent functions, ie, functions which take on one 
of two distinct values, will be considered: such functions are important 
in categorization, specially in the construction of dichotomies. 

Assuming that the activation function cr is bipolar and is symmetric 
about the origin in the range, ie, (+ 4- C- = 0, where (+ and (- are the 
extreme values taken on by a, as indicated in Chapter 2, it is easy to 
see that every function on 7^"(C,22)) for each of the admissible values of 
n, r, ( and j9, is characterized, largely, by the number of sign changes 
in the sequence defined on £u^(a, 'P"(C,3Z))- Figure 3.6 ip. 140) presents 
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some examples of (bipolar) bivalent sequences (binary if a hard limiting 
activation function is in use) representing (bipolar) bivalent functions 
on 7^1 (1, 0). These examples show the following. 

3.2.3 The number of distinct sign changes in the se- 
quences over which represent (bipolar bivalent) func- 

tions over V'!): (C, under a preservance weight w € Pn, for any a € 
varies from a minimum of Q to a maximum of \Vl)' (C, :^)| - 1* 

Recalling the operational nature of an isolated neuron, as detailed 
in Equation 3.1, the output t/, as a function of the input pattern x, is 
realized as the effect of a transformation a on the inner product 
the latter being shifted (in range) by the threshold 6. In the context of 
pattern recognition and processor realization it is of interest to know the 
nature of grouping in the collection of pre-images of the distinct labels 
(or types of labels) that get assigned to the function. An investigation of 
the discrete sequences that correspond to neuron outputs defined over 

(C>32) leads to the following. 

TrRoyosaTaoK 3.2.4 Neurons with activation functions that are sig- 
moidal (including hard-limiter) and a preservance weight w € Pn(<y), 
for any a € Sft-p, induce no more than one sign transition in the (fi- 
nite length) sequence over £i^(a,7^”(C,]?)) representing functions over 
KiCA). n = 1, 2, . . ., r = 1, 2, . . C € 3? € 5ft". 
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The above statement, in the context of activation functions cr that 
are hard-limiting, is an equivalent to linear separability and provides 
a reasonable extension of the notion of linear separability when a is 
sigmoidal. Separation of the pre-images of the distinct types of labels, 
interpreted in terms of the number of sign-crossings,^“ is the key to pro- 
cessor realization in neural networks. Discreteness in £^(q, V” (C, d)) 
restricts variational consideration in functions over to sign 

transitions rather than zero crossings even though a is defined to be con- 
tinuous. In the following the number of sign-transitions in the discrete 
sequences over termed as order of separability. 

the collection of order-0 and order-1 separable sequences (dichotomies) 
are termed linearly separable. Continuing with the characterization of 
representation the following result 

(PatorPOSdTJOJvr 3.2.5 The number of binary valued functions over the 
discrete space P" (C,‘0) that have exactly p sign transitions in the equiv- 
alent sequence under any preservance weight in !,■>„ is given by 

|p;(c,d)i-i\ _ 

n = 1, 2, . . .; r = 1, 2, . . .; C € 3R+, le and p = 0,1, .. . |P"(C, d)| - 1- 

In the above statement, though cr induces only a single sign tran- 
sition over its domain, multiple sign transitions in the sequences over 

the general case, separation of pre-images is interpreted in terms of level crossings, 
the level being set to be a mean of the values (or collection of values) that represent the 
distinct labels 
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representing functions over implied by a 

lack of monotonicity in the sequence over 'Pr (C»^)) representing 
points in P”(C,3Z) under a preservance weight w € Pn(of), a € 5?+. (See 
Figure 3.6 ip. 140) for a clarification of the non-monotonicity in the dis- 
crete sequence over (C, 2?)). All sequences in this figure refer to 

the input space Pf (1, 0). The example labeled (d) refers to the familiar 
parity (XOR) problem.) Table 3.2 ip. 144) and Figure 3.7 ip. 145) indi- 
cate the number of binary functions over P^(l, 0) equivalent to binary 
sequences with exactly p sign transitions expressed as a ratio, denoted 
by p, of the number of distinct binary functions (see the corollary to 
Proposition 3.2.1 ip. 139)) over the same discrete space for different 
values of n, r and p: 



It is evident from the tabulated values as well as the accompany- 
ing graph that the population of (binary) functions with exactly p sign 
transitions expressed as a ratio of the the number of possible func- 
tions shrinks, in general, as the dimensionality n and ranking r are 
increased for all values of p, p = 0,1, • . IP” (Ci22)l- The reason for 
this behaviour is simply a consequence of the structure of the function 
2"'^ (p)» P = 0, 1, 2, . . . fc, A; = 1, 2, , . . , as indicated in Figure 3.8: values 
in the isocurves parallel to the coordinate axes show a near binomial 
distribution. The ratio used above is the mass function (or relative 
frequency) of function realization with isolated neurons. 



144 


Chapter 3 Processor Representation in Isolated Neurons 


Table 3.4: Population of binary functions over (1, 0) with order-p 
separability relative to binary functions over P" (1, 0) 
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Figure 3.8: Plot of p = 0, 1,2, . . k,k = l,2,-.. 

3.2.6 The population of binary functions over 
n = 1, 2, . r = 1, 2, . . C € Sft-f., 2? ^ order of separability 

no more than p = 0, 1, . . in the equivalent sequences obtained under 
preservance weights drawn from Pn decreases as either n, or r or both 
increase for any given value ofp. 


In the above statement, n and r are assumed to be chosen such that 
the number of sign transitions p does not exceed \V'!f ((, t?)!, the number 
of distinct points in the discrete input space denoted by As a 

consequence of the above Proposition even if the activation function is 
revised to accommodate multiple level-crossings over its domain, as eg, 
(range scaled and translated) Gaussian functions 

= (C+ - C-) ea;p(^) + C-, € U, C-, C+ G !R, C- < C+, 
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or (range scaled and translated) Walsh functions 
r C+ for [-1,1), 

|C- for^ 

= crW(2^-l) + (-l)'^VW{2^ + l) + 2C-,V^ e3t, 
C-,C+ e3fi,C- <C+,i =o,i,i = o,i, , 


the proportion of (binary valued) processors realized with such an aug- 
mented isolated neuron in relation to the number of processors to be 
realized always decreases as the number of points in the input space 
increases (due to the dimensionality n as well as ranking ; ). In this the- 
sis, however, a is assumed to have only one zero crossing and thus only 
the representation of linearly separable functions will be considered. 


IPxoyosaTaoN 3.2.7 The number of binary functions over )^”(C,^), 
n = 1, 2 , . . ; r = 1, 2 , . . C e SR and i? € SR" that are linearly separable 
given any preservance weight in P„ is 


2 "+* ( 2 '”' - 1 ) 
( 2 " - 1 ) 




This statement follows from Proposition 3.2.4 ip. 141) and Propo- 
sition 3.2.5 ip. 142). (An idea of the number of linearly separable 
functions realized by an isolated neuron incorporating any preservance 
weight is obtained on doubling the entries of Table 3.2 ip. 135) for the 
different values of n and r.) From Proposition 3.2.1 ip. 138) the fi- 
nite length sequences equivalent to (discrete) functions defined over 
C SR", for given (and admissible) values of n, r, ( and d are 
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embedded in univariate functions over w G Pn- it is important to 
note that the single variable of the functions defined on so as to be 
equivalent to functions over 5R” is a representation of the closeness, as 
measured through the inner product operation, between the incident 
pattern x and the preservance weight Functions defined over 
to be representative of (multi-variate) functions defined over the dis- 
crete input space exhibit the following feature as a result of 

the operation of inner product being a continuous mapping. 

Toioj^osojoo'n 3.2S Under any weight w G W continuous functions 
defined over and taking values in y, y being a subdnterval of% are 
represented by the operation of inner product as continuous functions 
over C^. 

Continuing with bipolar bivalent activation functions, sign tran- 
sitions instrumental in the characterization of (binary) functions over 
'Pr (C) through the equivalent sequences defined over the discrete sub- 
set of preservation points in Cw are possible only through zero crossings. 
In the following, the zero crossings of functions over are assumed 
to be at points and not through intervals, ic, the support of the output 
value 0 is of null Lesbegue measure. Figure 3.9 (p. 149) illustrates 
the preservation points of Pi (1,0) in Cw- the contribution of Boolean 
hyper-cubes corresponding to the different values of r, r = 1,2,3, are 
also indicated. Zero crossings are allowed in the intervals between 
adjacent preservation points. 
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Figure 3 9' Preservation points of Pi (1,0) in w€ Pa 

As indicated in the above Fignre, the spacing between adjacent 
preservation points in for a preservance weight w e Pn(n'), a G S?+ 
is not uniform when r is assigned values that are larger than unity. The 
length of the intervals corresponding to the projection points of (C, 2?) 
in Cyj is either or one half this value: the latter is applicable 

only when r = 2, 3, . . .As the lengths have a common factor, the admis- 
sible intervals of zero crossings is will be denoted, for n = 1,2,...; 
r = 1, 2, . . ; C € 5R+ and 2 ? € 3?"", by 

= C2-"'’h-1.0-C^2^^(7rrr+mj2, 
t = l,2,...2(|P,"(C,2i)| - i7^”-i(C2Z)l - 1). (3.4) 

where ffiA + 02 is the set {^ = /J,!/ + 02 \i' £ -4} for all ft\ ,02 ^ ^ and 
r^o (C)>l?)l = The zero crossing intervals — 6>''(a,(,i2)(*) for the val- 
ues of i indicated in Equation 3.4 are considered as proper subsets of 
I 21 e a set isomorphic to 91, however, oriented in the 
direction of the preservance weight w. 

;P,'Ro;posjTyo'N' 3.2.9 Functions on C^, under a preservance weight 
w € PnCn), for any n e 9t+, embed the representation of functions 
over n = 1,2, r = 1,2,...; € 9J+ and ‘d_ € 9i", which 
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have odd number of zero crossings in any interval -0"(a,C,]2)(t), i = 
1,2,. 2(|P”(C,2£)| - ~ 1)' “ single sign transition be- 

tween the points in contained in the boundary of ^6^ {a, C, t?) (t), ie, 
2!^0"(Q;,C,:^)(t)\-0”(a,C,i)(t)r where, -0"(a,C,i)(O i^e closure of 
^0r(a,C,:^)(t)- 

In the ensuing discussion, however, at most one zero crossing is as- 
sumed in the intervals between adjacent preservation points of Vf (C, t) 
in Cw. Though the actual number of intervals where zero crossings are 
admitted is given by (P"(C, 22^)1 - 1. Equation 3.4 suggests a larger num- 
ber while maintaining uniformity in length. A few properties of the 
zero crossing intervals are listed in the following. 

TatoyosaTnoN 3.2.10 Given any preservance weight w 6 Pn(o:), Q G 
3?+, the zero crossing intervals, in C^, corresponding to functions over 
■P”((^,d), ri = 1,2, . . .;r = 1,2, . . .; C G andf, G 5R", have the following 
characteristics with p denoting Lesbegue measure (defined on jy and 
t = 1,2, . . .2(|^,"(C,2i)| - P."-i(C,:^)l - I). 

1. Interval shrinkage with dimensionality and ranking: 

m > n^orTi > r 2 implies (n,(,:i)(i)), 
for all 7X1,712 = 1,2, . . and ri,r 2 = 1,2, . . .,. 

2. Interval dilation with scaling: 

p(—0^{a, C,^)(i)) increases as ( or a increase. 
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3 Independence of interval length to translation: 

6)^(q, Cii2)(^)) independent of 0 for all 0 e di'" 

4. Subdivision of intervals with ranking: 

1=1 J -1 

r = 2,3,.. 

Functions over V^iC f) and SR", n = 1, 2, . ; r = 1, 2, . . . ; C € SR+ and 
i2 G SR", have until now been represented in using a generic preser- 
vance weight w e Pn(u), for any a G 3?+. Recalling the multiplicity of 
preservance weights for ■P"(C,]?) given an a, as established in § 3.1, the 
following hold. 

yiR070S0730X 3.2.11 For every function /• — > [C-,C+L = 

£_^ and £^(|N| d)) = -C-uidkll lii £ and all 

admissible values ofn, r, ^ and d. 


This statement is obvious noting that the spaces and 2^ are 
defined to capture traversal in the direction of the specified weight ui. 
Note that in £„(a,-4), the notation for the collection of preservation 
points in of a discrete space A imder a preservance weight w, the 
suffix w denotes the orientation of the collection of preservation points 
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in and a denotes the (positive) factor by which the basic vector^ ^ in 
the direction ofw needs to scaled to get the desired weight. For weights 
w e the basic vector in the direction of w will be considered to be 
the same as the unit vector in the direction of w and for this reason 
the scale factor a will be equated to the norm ||i^|| of the weight w. It 
is important to recognize that sequences defined on £^(||Mil , 
relate to the sequences on £_^(||w|| , V!;! {(,£)) in the following manner. 

IPiRO j’OSrJTiiOK 3-2.12 For every function /* ((, t?) [(~ , C+l realized 
by an isolated neuron with a weight w E the function realized by the 
weight -Wy denoted by /--* [C-, C-fL ^ complement of f in 

the sense that 

v^Ep;(c,^) /(x) + /-.(x) = c+ + C~. 

3.2.13 For every function /• [C-^C+l there 

exist preservance weights Wi)W .2 ^ ^^n(o:), Wi 7 ^ W 2 and W 2 ^ -Wp 
such that for any a E the sequences over and 

representing f are identical when expressed in terms 
of a traversal over the spaces (n, (a, :^)) and (a, (a, d)), re- 
spectively, rather than over the domain space of f. 

IP^iooy: The statement will be established through the following ex- 
ample with the accompanying illustration. 

^^Por the preservance weights considered in the preceding discussion, the basic vec- 
tor is one of the 2^ nl weights provided by the assignment suggested in the proof of 
Theorem 3.1.1 (p. 113). 
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For the discrete input set 
with the associated 
hypotheses on n, r, ( and 
2?, consider any preservance 
weight Wi from the collec- 
tion Pn (o;), for any appropri- 
ate value of a, a € 3?4. . Con- 
sider now the weight W2 
structed from as W2, = 
wi^, i = 1 , 2 , n, i ^ j and 
W2^ = -wi^ for some j = 

1 , 2 , n or as = xui ^ , t = 

1 . 2 . . j2 and W2,^ = for some jxj2 = 

1 . 2 . . . . n. It is obvious that 2 ^^ e Pn{ox) implies W2 € Pn(u). If we now 

consider sequences si^ B and 52: 

B such that Vx G f{x) = si^ = ^ = WLi j = 1^2 *hen 

we find that 

V£i,X 2 e (C> 22 ) (si., ,S1 ,j) e ( 52 ,, ,S2,j) € R 

for any (ordering) relation R C B x B, where ;i = Mii £1, t2 = i'll ai2> 
it = 2112 SJOid h = W.2 S.2- A typical example for the relation R is the 
’less than or equal to’ relation denoted by <. 

□ 
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An immediate implication of the above statement is that the number 
of linearly separable functions realizable by an isolated neuron is not 
given simply as the number of distinct preservance weights (as given 
by Proposition 3.1.2 {p 119)) multiplied by the number of linearly sep- 
arable functions realized through any preservance weight (see Proposi- 
tion 3.2.7 ip. 147)), but is to be sought out through attempts at under- 
standing the (algebraic) structure of preservance weights. However, an 
investigation into the algebraic properties of the class of preservance 
weights is beyond the scope of this thesis. 


3.3 Learning of Preservance Weights and 
Generalization in isolated Neurons 

Functions realized by isolated neurons are decided by the weights w and 
threshold 9, as discussed in Chapter 2 It is imperative that the weights 
and threshold are automatically specified given, information in terms 
of inputs, with corresponding (possibly partially specified) outputs, re- 
lated to the required mapping: this automatic specification has been 
termed learning. The collection of inputs, drawn from P" (CilZ) C 3?” for 
a suitable choice of r, ( and •§_, together with the corresponding require- 
ment on the assignments to the output constitutes the training set. For 
convenience, the collection of inputs contained in a training set will be 
denoted by Ti C and the projection of these inputs along a 

weight w € Pnio), for any a G 3i+, will be denoted by £.^{a,Ti). 
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A key issue in function realization through isolated neurons in ad- 
dition to learning is that of a specification of assignments to the output 
corresponding to inputs not contained in the training set, ie general- 
ization; this issue too influences the choice of weights and threshold. 
Maintaining the tradition of neural network research, this discussion 
will consider only the problem of specif 3 dng weights and thresholds in 
the scope of learning and generalization and will not dwell on the issue 
of specif 5 dng the nature of activation function a 

In view of the preservation of P,"(Cid) in specification of a 
function over 'P"(C,l2) is equivalent to the specification of a univari- 
ate function over £„ at the finitely many distinct points given by 
Thus, generalization viewed as a problem of incor- 
porating the functional characteristics specified in the training set to 
an input space equaling the entirety of either or reduces 

to a problem of function extension -commonly addressed in discussions 
of functional analysis - and under the preservance weights, generaliza- 
tion amounts to extending a function defined typically over a subset of 
£u;^(a, P" (C, i.)) C for any a e 5ft+, to the entirety of 

I will begin by considering the case wherein all inputs in 7^"(Ci22) 
are considered, together with the corresponding outputs, in the training 
set, ie, % = P"(C):d). Generalization is restricted in this case to a 
function extension from to J?", equivalently the discrete set of 

preservation points £j£(Q;,'P”(C!:f)) C to the entirety of where 
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w € Pn{oi) for any a G For simplicity of reasoning, the activation 
function a is assumed to be of the hard-limiting type (see § 2.2). Under 
this assumption, functions that are defined over i) are essentially 
two-valued and the following definition is invoked for reasons of clarity 
in the discussion 

3.3.1 A function {C_,C+}> C-- 5 C+ ^ K, 

C- IS termed a dichotomy. //(- = “C+/ dichotomy is termed 
bipolar. Dichotomies that are onto for the entire range space {C-,C+} 
are termed non-trivial. In the context of processor representation with 
(isolated) neurons, the mapping is considered surjective over a subset % 
ofVlj ((, I?)- A dichotomy is termed complete if the mapping is surjective 
on the entirety 

In an isolated neuron weights w that preserve V(! (C, 2?) in belong 
to Pn the cardinality of the class of preservance weights is more than 
unity for all values of n. Though all weights in Pn preserve the dis- 
crete space VriC.'i,) in the linear subspace (of is different for 
weights w which differ in orientation. In the following, the dependence 
of function representation on the choice of weights is formally stated. 

3.3.1 The sequences over -Cx£(a;,P” (C,^)) represent- 
ing functions (including dichotomies) over (C, d) under a preservance 
weight w vary as w varies over Pn. 

^^The notion of a dichotomy has already been made use of in Chapter 2 while detailing 
the available characterization of isolated neurons and networks of such neurons. 
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This aspect has been dealt in considerable detail in § 3.2. However, 
the following is to be noted in the context of selecting weights in iso- 
lated neurons: the first of these statements is equivalent to item 2 in 
Proposition 3.2.10 {p. 150). 

3.3,2 Variation in preservance weights due to scaling, 
as decided by a, alters the sequences over (representing 

functions over only in terms of scale and not in terms of sign 

transitions, 

'P:R,os>os3aDO'N 3.3.3 Variation of weights w over Pn{cx) for any a G 
influence the number and location of sign transitions in the sequences 
over £ 3 ^(a,'P”(C, 2 ?)) representing functions over 

Invariance of the number and location of sign transitions (ie, zero 
crossings) in the sequences to specific variation of preservance weights, 
as indicated in Proposition 3.2.13 ip, 152), should, however, be noted. 
Recalling the consequence of linear separability on the nature of se- 
quences admissible on £^(a, as indicated by Proposition 3.2.4 

(p, 141), the problem of learning of weights, corresponding to a linear 
separable dichotomy, in an isolated neuron is equivalent to one of find- 
ing a weight w in Pn such that the number of sign transitions in the se- 
quences over iiio(a,P^(C,3?)) representing bipolar bipolar dichotomies, 
over VriC.i) is no more that unity, thereby leading to the following. 
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3.3.4 A bipolar dichotomy Ti’ {(, 

linearly separable if there exists a weight w e for any o' e | , 

such that no more than one sign transition occurs in the bipolar bivalent 
sequence over (Ci!l2)) representing S 

T:>C£Oir£m3.3.1 Bipolar dichotomies exhibiting an invariance in the 
number of sign transitions for all preservance weights in Pn(<'0 for all 
a € are either constant (and, hence, trivially linearly separable) or 
not linearly separable, n = 2,3, . There are only two linearly separable 

bipolar dichotomies T): {C-)C+} independent of the dimen- 

sionality n, viz, functions which assign uniformly one of C- and to 
all points in 

Taiooer: Invariance in the number of sign transitions to preservance 
weights w ^ Pn (of) for any a € of a constant function is obvious. 
The only bipolar bivalent functions that exhibit this feature are those 
that asssign, uniformly, one of dbl to all points in (C, il)- 

Functions that represent dichotomies that are linearly separable 
cannot also exhibit invariance in the number of sign transitions to ail 
preservance weights in Pn (a) as established in the following. Given 
that a weight Wi e Pnia) represents a linearly separable dichotomy, 
consider, if possible, the same dichotomy to be represented by a weight 
W 2 derived from uh as below: 


W 2 j = for some j = 1,2, . . .n, W 2 , = tni,, ^ = 1,2, . .n, i ^ j. 
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As ] is varied, for at least one value of j the sequence over the discrete 
space {a, (C, ^)) will have at least two sign transitions. This leads 

to a contradiction and, thereby, the necessary statement is established. 

The argument invoked above also shows that the only functions 
that exhibit invariance in the number of sign transitions are those that 
have an assignment that is invariant to permutations in the weights. 
It is now simple to see that such functions are, indeed, not linearly 
separable, a typical example being the parity function (XOR when the 
input space dimensionality ii — 2) 

□ 


(The construction used in the proof of the above statement has already 
been used in the proof of Proposition 3.2.13 ip. 152).) 

ToioiPOcSOJrJOX 3.3.5 For every non-trivial dichotomy T)* PriCi^) 
{C-,(+K n = 1,2,,, ; r = 1,2,.. ; C € 3? € and C-,C+ ^ 

C- < C4> there exists at least two weights in pTi(n), for any cx G 9^4-, such 
that the bipolar bivalent sequence over £w{a,P^^{Cid)) has more than 
one sign transition. 

T:xoo:f: Refer the proofs of Proposition 3.2.13 ip. 152) and Theo- 
rem 3.3.1 ip. 158). Given any non-trivial dichotomy D, there exists 
a weight w e Pn(a) such that the sequence over £t£(o;,P;'(C,32)) has 
at least two sign transitions: this is obvious in the event the given 
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dichotomy is not linearly separable and Theorem 3.3.1 ip. 158) assures 
the existence of such a weight in case of linearly separable dichotomies. 
The argument of Proposition 3.2.13 (p. 152) is applicable to all bipo- 
lar bivalent functions over P"(C>^) and assures the existence of the 
second weight given a knowledge of a weight satisfying the stated hy- 
pothesis. Note that the collection of preservance weights (n ), for any 
a G is a complementary structure in the sense that Vw g Pn{a), 
w G Pnia) => -w G Pnioi)- Thus multiple sign transitions in the se- 
quences over P" (C, id)), corresponding to non-trivial bipolar biva- 
lent functions over V) (C, d), are caused by at least four distinct preser- 
vance weights (but two distinct orientations). 

□ 

The above Proposition implies that an isolated neuron cannot real- 
ize all dichotomies on (Such a result has long been known 

in relation to dichotomies on B”.) In addition, the above Proposition 
shows that only a subclass of p„ need be searched for a solution to 
the learning problem of linear separable dichotomies on the 

identification of this subclass is based on the number of sign transitions 
(zero crossings) in the sequences over i(,d)), where w refers 

to a candidate weight in P„. The following definition is introduced to 
provide a criterion to aid the selection of preservance weights. 

T>ETm3T30'N 3.3.2 Given a function /: V) (C, i?) -+ [C_ , C+] (dichotomy 
(CiiZ) — »■ {C-jC-i-H ° preservance weight w G Pnioi), for any a G 
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5 R 4 ., IS termed admissible for / (D) if, in the sequence over (a, (C, :^)) 

representing f ('D) over (C, number of sign transitions does not 

exceed that accommodated by the activation function cr. 

Proposition 3.3.4 {p. 158) provides an interpretation of linear sepa- 
rability in terms of partitions on £^( 0 :, effected by sign tran- 

sitions (zero crossings). (In the same vein, the general situation of 
order-p separability can be interpreted in terms of partitions induced 
by level crossings. However, for reasons of convenience, only bipolar di- 
chotomies that are also linearly separable -/e, order-0 and order-1 -are 
considered in this thesis.) The criterion of admissibility of weights re- 
stricts choice of preservance weights to those that ensure a realization 
of the desired dichotomy. From the preceding discussion it is easy to 
establish the following. 

Tik£o:r£M 3.3.2 The problem of learning a given nondrivial complete 
bipolar dichotomy £)* P”(Cj1?) {C-» C+} isolated neuron with a 

hard-limiting activation function involves the two distinct steps: 

1. Enumerate the weights w G Pn(cv), for any suitable a G till 
either w is admissible for J) (ie, the number of sign transitions in 
the sequence over £^(a, t?)) representing the given dichotomy 

does not exceed unity) or the space of preservance weights is ex- 
hausted. In the latter case the dichotomy is not linearly separable. 
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2. In the event a preservance weight admissible for D is obtained Tie, 
the given dichotomy is linearly separable) assign a value to the 
threshold 9 from the set where i, the index of sign 

transition location is identified by the relation 


iDj i-i 

(m ))n u 


= -S)| 


{w v^(c,±))n 




,(C.i3)!-i) , 




J=l+l 

where, t = l^2^ . 2{\V^{C,:&)\ - — 1) and w Adenotes 

the collection of inner products, with w, of the elements of A, for 
any A C w A C 


The above theorem shows that learning involves an enumerative 
procedure for weights (see Figure 3.2 {p. 122)) and a search for the 
threshold in a linearly ordered space and, thereby, provides the basis 
for an algorithm for learning in isolated neurons: the details of such 
an algorithm will not, however, be taken up in this discussion. Any ele- 
ment of the interval (C, jZ) (^)5 where i is determined as indicated in 

step 2 of the above theorem, is allowed to be a candidate for the thresh- 
old. No commitment is, however, made on the nature of assignment to iD 
over the interval ^ 0 ;?(C, 2 Z)(O> i = 1 , 2 , . .2(|P”(C,tZ)| - l^r-i (Cf)l - 1). 
which contains the threshold 9, Asa consequence of Proposition 3.2.13 
(p. 152) the following is important to note. 

^^Note that iii ^ bipolar dichotomy. 

the interval is defined to be left closed, the function represented will be right 
continuous. 
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3.3.6 The preservance weights w G Pn{(^), for any 
a G admissible to a given bipolar dichotomy T) are not unique. 


In order that the admissibility of a preservance weight selected 
through the learning procedure is not altered, generalization, viewed as 
a situation of function extension from for any a G 5R+, 

n = 1,2, . ; r = 1,2, . . ; C € Si-f, ^ G to is expected to pre- 

serve the number and location, at least to the extent of the interval 
^0:!(a,C,^)W,2 = l,2, . 2(|:P;(C,]?)|~|:P;Li(C,t?)|-l),ofsignt^^^ 
tions (fe, zero crossings) in the assignments made to points in given 
the assignments to points in the discrete space Note 

that in view of Proposition 3.3.3 (p. 157), the index of the interval in 
which the threshold belongs is not altered by a, a G 5R+, though, the 
specific nature of interval is very much influenced by a. 

Having discussed the case wherein the training set consists of a 
complete dichotomy, I will now consider the more general and realistic 
case wherein the training set does not contain all points of in 

the training set, ie, % c the containment being proper. As a 

consequence the collection of inputs contained in the training sot when 
projected along any preservance weight w G Pnioi), for any a G will 
form a proper subset (viz, £w(o:, %)) of the discrete set £t^(Q', P” (C> 2?)) C 
C^. The weight and threshold obtained in the learning of a dichotomy 
on % are expected to be related to those obtained in the learning of a 
dichotomy on P” (C, 
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Generalization, in this case, involves two components: one of ex- 
tending functions defined on % to equivalently extending 

functions defined on £^(rv,7i) to nnd the other is of 

extending functions defined on % to \ equivalently extend- 
ing functions defined on X) to \ (C, iZ))* Here again, 

the operational criterion of generalization is a preservation of the num- 
ber, and location, of sign transitions in the function over Cw given the 
assignment over w E Pn{oi) for any a G Additionally, the 

process of generalization is also expected to preserve the number and 
location of sign transitions in the sequence over £;^(a, given 

the assignment over (a, %) 

IPaioaPOs:JT:JOK8,3.7 Learning a bipolar dichotomy wherein % c 
(Ci22) for admissible values of n, ( and is equivalent to the 
problem of learning a complete bipolar dichotomy once the assignments 
to points in \ w E Pn{ot) for any a E is 

completed by the process of generalization. 

3.3.8 Given a bipolar dichotomy ^ {(„ , C 4 . }, % C 
n = 1, 2, . . ; r = 1, 2, . . C € 5R+ and 2 ? E every preservance 
weight wEPn (of), for any a E SR+, admissible for the bipolar dichotomy 
® P^(C,iZ) {C-, C+} i^ olso admissible for S, wherein S|r. = S. 

The above Proposition suggests that the preservance weights admis- 
sible in the representation of a dichotomy S are inherited as admissible 
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preservance weights of the dichotomy ® : with an interpretation of 2) 
and 2) as subsets of the Cartesian product t?) x {C- , C+ } is easy to 

see that 2) C 2). Generalization, in view of the above inheritance of ad- 
missible preservance weights, is to play the role of ensuring that given a 
dichotomy 2), the dichotomy 2) suggested as an extension of 2) will have, 
as admissible preservance weight(s), at least one of the several preser- 
vance weights admissible to 2). The converse of Proposition 3.3.8 holds 
when the completion 2) of a given dichotomy® ensures that the number 
of zero crossings in the sequence over the collection of projection points 
n G SR+, representing the desired function on 
is no different from that in the sequence over Slw{a^ Ti) representing the 
given function on 7^, 

Learning, as an approach of specifying an admissible preservance 
weight w given a partially specified function and identification of a re- 
gion, in £^, within which the threshold, is located, is not restricted to 
the case of dichotomies alone and is easily extended to the more general 
situation of representing processors mapping (C, 3?) to [C- ? C+] G 3? as 
established in the following: the discussion is restricted to the case 
of isolated neurons with sigmoidal activation function for reasons of 
convenience. In this case too, the considerations of generalization are 
identical to the situation wherein hard-limiting activation functions 
are used. Processors are assumed, in the following, to realize bipolar 
functions: this assumption allows learning to be characterized on the 
lines of Theorem 3.3,2 (p. 161). 
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Cr3K:£oai£M3,3.3 Representation of processors f - % ^ 

in an isolated neuron with a sigmoidal activation function 
involves the two distinct steps: 


L Preservance weight enumeration -this is identical to step 1 in The- 
orem 3.3,2 (p. 161), except that the enumeration seeks for a preser- 
vance weight admissible to /. 


2. Threshold range identification -this is similar to step 2 in Theo- 
rem 3.3.2 (p. 161). 6, the threshold, is assigned a value from the 
set (C, 1?) the index i,i = 1 , 2 ,.,. 2{\V^{C, 2 ?)| (C, ±)\- 1), 

being given by 


SQTl(f) I t— 1 

J=1 

where, sgn{') is the sign function 


-1 if^<0, 

e 3? sgniO = ^ 0 = 0, 

+1 otherwise. 


A point to note in Theorem 3.3.2 ip. 161) and Theorem 3.3.3 above, 
in particular, the steps wherein the interval of sign transition (in 
is identified, is that the index i, ^ = 1,2, . . .2(|7^;;‘(C,^)| I’PrCC.]?)! - 1), of 
the set ^01^ (a, (, 0 (i) is, in general, not unique. 
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Table 3.5 (p. 168) and Table 3.6 ip. 169) provide examples of func- 
tions that are to be realized through an isolated neuron with a func- 
tion extension: these examples show associations between Pi (1,0) and 
{-1, -}-l}. In the dichotomies indicated, members of the training set %. 
are emboldened. Normal entries refer to the association resulting from 
a generalization (function extension). An enumeration (see Figure 3.2 
ip. 122) for the scheme) of the preservance weights of Pi (1,0) is pro- 
vided in Table 3 7 ip. 170). The ensuing discussion easily extends to 
the case when the sign of the neuron outputs, rather than the outputs 
themselves are expected to be in the set {-1,4-1}. (However, in the 
latter case, it is important to recognize that the specific values realized 
depend on the type of activation function a.) 

In an isolated neuron, a function on the discrete input space P,? (C, 
under a preservance weight w is equivalent to a sequence on the col- 
lection of projection points ilt£(Q:,P^(C,2?))- Table 3.8 ip. 170) indicates 
the sequences that would result due to functions /i and /2 indicated 
in Table 3.5 ip. 168) and Table 3,6 ip. 169). Sequences corresponding 
to weights only are considered noting the equivalence 

indicated in Proposition 3.2 13 ip. 152). .sj in Table 3.8 denotes the se- 
quence over representing the function 

/j, f = 1,2, e = 0,1. Table 3.8 indicates that the preservance weight 
is not admissible for function fi while i£<i> is admissible. In the 
case of function /2 no preservance weight in Pn is admissible, ic, /2 is 
not a linearly separable dichotomy. 
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Table 3.5: Association fi . VUl^Q) {-*1,4-1} 


(-1 3125,-1 3125) h-v 1 
(-1.1875,-13125)^ 1 
(-1 3125,-0 6875) H-- 1 
(-1 1875,-0 6875)1-+ 1 
(-0 8125,-1 1875) 1 

(-0 6875,-1 1875)1-^ 1 
(-0 75,-0 75) 1 

(-1 3125,0 6875) 1 

(-1 1875,0 6875) 1 

(-1 3125,13125) h-. 1 
(-1 1875,13125) H-. 1 
(-0 8125,0 8125) 1 

(-0 6875,0 8125) 1 

(-0 75,1.25) P-+ 1 

(0 6876,-1 3125) -1 

(0 8125,-1 3125) 1-^ -1 
(0 6875,-0 6875) -1 

(0 8125,-0 6875) -1 

(1 1876,-1 1876) h-+ -1 
(1 3126,-1 1876) h-+ -1 
(1.25,-0.76) 1-^ -1 

(0 6875,0 6875) 1 

(0 8125,0 6875) 1 

(0 6875,1 3125) »-+ 1 
(0 8125,1 3125) 1 

(1 1876,0.8125) *-+ 1 
(1 3125,0 8125) 1 

(125,125) 1 


(-1 3125,-1 1875) 1 

(-1 1875,-1 1875) 1 

(-1 25,-0 75) 1 

(- 1 ,- 1 ) 1 

(-0.75,-1.25) -+ 1 

(-0 8125,-0 8125) H- 1 
(-0 6875,-0 8125) 1 

(-13125,0 8125) 1-^ 1 
(-1 1875,0 8125) 1 

(-1 25,1 25) H-f 1 
(- 1 , 1 ) 1 
(-0 75,0 75) 1 

(-0 8125,1 1875) I-+ 1 
(-0 6875,11875) 1 

(0 6875,-1 1875) -1 

(0 8125,-1 1875) -1 

(0 75,-0 75) i-v -1 
( 1 ,- 1 ) -1 
(1 25,-1 25) -1 

(1 1876,-0 8125) -1 

(1 3125,-0 8125) H-+ -1 
(0 6875,0 8125) 1 

(0 8125,0 8125) 1 

(0 75,1 25) +-+ 1 

( 1 , 1 ) 1 

(1.26,0.76) 1 

(1 1875,1 1875) ^ 1 
(13125,1.1875) P-+ 1 


(-1 25,-1 25) 1 

(-1 3125,-0 8125) 1 

(-1 1875,-0 8125) 1 

(-0 8125,-1 3125) H.+ 1 
(-0 6875,-1 3125) 1 

(-0 8125,-0 6875) h-. 1 
(-0 6875,-0 6875) 1 

(-1.25,0.75) 1 

(-1 3125,1 1875) H-f 1 
(-1 1875,1 1875) 1 

(-0 8125,0 6875) h-. 1 
(-0 6875,0 6875) 1 

(-0 8125,13125) H-t 1 
(-0 6875,13125) 1 

(0 75,-1 25) -1 

(0 6875,-0 8125) -1 

(0 8125,-0 8125) h-. -1 
(1.1875,-1.3125) H-+ -1 
(1 3125,-1 3125) -1 

(1 1875,-0 6875) -1 

(1 3125,-0 6875) -1 

(0 75,0 75) 1 

(0 6875,1.1875) n-f 1 
(0 8125,1 1875) K+ 1 
(1 1875,0 6875) 1 

(1 3125,0 6875) 1 

(1 1875,1.3125) 1 

(1 3125,13125) 1 
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Table 3.6: Association /a * ^ 3 ( 1 , 0 ) {-1,4-1} 


(-1 3125,-1 3125) -1 

(-1 1875,-1 3125) -1 
(-1 3125,-0 6875) -1 

(-1 1875.-0 6875) h -. -1 

(-0 8125,-1 1875) -1 

(-0 6875,-1 1875) -1 

(-0 75,-0 75) -1 

(-1 3125,0 6875) 1 

(-1 1875,0 6875) 1 

(-1 3125,13125) 1 

(-1 1875,13125) 1 

(-0 8125,0 8125) h -. 1 
(-0 6875,0 8125) 1 

(-0 75,125) 1 

(0 6875,-1 3125)^ 1 
(0 8125,-13125)^ 1 
(0 6875,-0 6875)*-^ 1 
(0 8125.-0 6875)*- 1 
(1 1875,-1 1875) 1 

(1 3125,-1 1875)*- 1 
(1.25,-0.75) *- 1 

(0 6875,0 6875) *- -1 
(0 8125,0.6875) *- -1 
(0 6875,13125) *- -1 
(0 8125,1 3126) *- -1 
(1 1875,0.8125) *--1 
(13125,0 8125) H -.-1 
(125,125) *-^-1 


(-1 3125,-1 1875) — -1 
(-1 1875,-1 1875) -1 

(-1 25,-0 75) -1 

(- 1 ,- 1 ) *-> -1 
(-0.75,-1.25) — -1 

(-0 8125,-0 8125)*--! 
(-0 6875,-0 8125) *- -1 
(-1 3125,0 8125) *- 1 
(-1 1875,0 8125) *- 1 
(-125,125) *- 1 

(- 1 , 1 ) 1 

(-0 75,0 75) *- 1 

(-0 8125.1 1875) *- 1 
(-0 6875,1 1875) *- 1 
(0 6875,-1.1875)*- 1 
(0 8125,-1 1875)*- 1 
(0 75,-0 75) *- 1 

( 1 ,- 1 ) — 1 

(1 25,-1 25) *- 1 

(1 1875,-0 8125)*- 1 
(13125,-0 8125)*- 1 
(0 6875,0 8125) *- -1 
(0 8125,0.8125) — -1 
(0 75,1 25) *- -1 

(1,1) K^*-l 

(1.25,0.75) *- -1 

(1 1875,1.1875) *--1 
(13125,1 1875) *--1 


(-1 25,-1 25) *- -1 

(-13125,-0 8125) *- -1 
(-1 1875,-0 8125) -1 

(-0 8125,-1 3125) *- -1 
(-0 6875,-1 3125) *- -1 
(-0 8125,-0 6875) *- -1 
(-0 6875,-0 6875) *- -1 
(-1.25,0.75) *- 1 

(-13125,1 1875) *- 1 
(-1 1875,1 1875) *- 1 
(-0 8125,0 6875) 1 

(-0 6875,0 6875) *- 1 
(-0 8125,1 3126) *- 1 
(-0 6875,1 3125) *- 1 
(0 75,-1 25) *- 1 

(0 6875,-0 8125) 1 

(0 8125,-0 8125) *- 1 
(1.1875,-1.3125) *- 1 
(1 3125,-1 3125) •- 1 
(1 1875,-0 6875) *- 1 
(1 3125,-0 6875)*- 1 
(0 75,0 75) *- -1 

(0.6875,1 1875) *- -1 
(0 8125,1 1875) *--1 
(1 1 B 76.0 6875) * ♦ -1 
(13125,0.6875) -1 

(1 1875.1 3125) *--1 
(1 3125,13125) *--l 


I 
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Table 3.8: Sequences representing functions /i and /2 
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Unlike the hitherto accepted notion of learning by examples, the ap* 
proach to learning, in the sense of an automatic specification of weight 
and threshold values given a dichotomy, even though discussed at the 
level of isolated neurons, requires only a single instance of (valid) as- 
signment to each of the inputs included in the training set. The pro- 
cedure of specifying weight and threshold values rather than being 
iterative is enumerative in nature and is quite attractive in situations 
where on-line learning is necessary. 


3.4 Preservation in Higher Radix Input Spaces 

Preservation of input spaces and the associated simplification in the 
learning procedure have been discussed in the foregoing in the specific 
case wherein the elements of the discrete space (C, 2i) » for admissible 
values of r, C and j9, correspond to a binary number system. As 
the notion of preservance is based on the operational correspondence 
between inner product and positional numbering systems, it is readily 
apparent that the notion of preservation of discrete input spaces by 
appropriately chosen weights is not restricted to collections of binary 
vectors, or vector collections derived from binary spaces. The following 
observation of the preservation in numbering systems with a radix r is 
analogous to Theorem 3.1.1 (p. 113). 


^®Note that is constructed from scaled and translated versions of the space of 

binary inputs 
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Jxeo3i£M 3.4.1 A weight vector w given bywi = at' = n, 

for any a € preserves in £ju all the r” points of {0, 1, . , r — 1}", a 

discrete space of radix t, r = 2, 3, . . for all n = 1, 2, . 


In the following I will consider the following discrete spaces of radix 
r, r = 2 , 3, . . . , the elements being real numbers. 



3 113 

2 ’ 2 ’ 2 ’ 2 ’ • ' 

-- 2 , -- 1 , 0 , 1 , 2 , . 

, ^ , if c is even, 

, (3.5) 

• ’ } > otherwise. 

rW"(C.]2) = Cr7t” + 22, 


(3.6) 


where, n = 1, 2, . . ; C S ^4., t? G 3?" and xhC is the n-fold Cartesian 
product of xH. (rH(C,39) = VC € 1? G The collection 

of scaled and translated binary vectors introduced in § 3.1 are specific 
instances of the discrete sets indicated by Equation 3.6: i?"(C,22) = 
27f"(2C,ii) for all 7z = 1,2, . f e 5?+, and t? G 5ft". A discrete space 
t'P" (C) i.) (analogous to P" (C, £)), r = 2, 3, , is also considered through 

the following recursive construction 

= 0, (3 7a) 

r^r(C,f) = r7f”(C,3i), (3.7b) 

rKiCl) = r)P"_i(C,32)U('u 

r = 2,3, .., (3.7c) 

where, i = 1, 2, . . t"", are the t"'’ (ordered) points of the discrete 
set xVx_iif,l)\ r'P"_2(C>:^)>n= l,2,...;Ce 5ft+ andi9 G 5ft". 
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The centroid of (Ci2^)j for all admissible values of r, /?, r, C and 
j?, is the same as that of cW(C,22) the above construction. It is 
also important to note that in the above construction of r'^r the 

centroidal element of the scaled and shifted discrete space 
is made to coincide with an appropri?*^ ..ment of r7^r(C>^)> ^ = 
1,2, . r - 1. As a consequence, th*” . ..cture of r'Pr(C>2Z), is not the 
same for even and odd radices t except for the case when r, the ranking 
index, is unity, (C^ nl) with an odd radix r will, in general, have sev- 
eral points of different ranks coinciding as compared to the case when 
r is even The following is a consequence of the above construction. 


3.4.1 The number of distinct points included in 

the discrete space t'Pr (C)32)^ r = 2, 3, . . .;n = 1, 2, . . .; r = 1, 2, . . C € 5R+ 
and 'd, E is given by 




— 1 ) . . 

— ^ if c IS even, 
r" - 1 

r"’’, otherwise. 


The growth in cardinality of as a function of the rank- 

ing index r, r = 1 , 2 . . , is similar in nature to that demonstrated in 
T'r(C.,tL), n = 1,2, , C € Sfi and e 9t'‘, including the property of 

denseness in for a preservance weight w £ Pn(«), for any a € K+: 
hence, these aspects of (C,2i) are not being established explicitly. 
In a similar way the interval subdivisions in £ju(a,P’*(C, j?))) for any 
a e 9?" and admissible values for n, r, C and i?, are also seen in the dis- 
crete space (a, x'Pr (C, l2)) corresponding to x'Pr (C, l2) for all radices r, 
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r = 2, 3, A few other properties of the discrete spaces t'P" (C: 2?) are 

stated in the following. 

lP3t03>0S3T3 OK 3.4.2 Fov distinct radices tj and t 2 , ti,r2 = 2,3, 
such that ri and t2 are both even (or both odd) and ti < t2, n "P” ((> iZ) C 
{(),§) for all admissible values ofn, r, C and d. 

iPKOoy; Noting that C tfH for the hypothesis considered on ti 
and C2, the statement is an immediate consequence of the construction 
of the discrete space c7^"(C,2Z)- 

□ 

eo3to*£,.A3iy TO y3i03>0S3T30K 3.4.2 ^ P" {f, '0) C (C, £) for all 

Ci,r2 such that ti and X 2 are both even, or both odd, ci < t2, ti,t2 = 
1,2,. ri < r 2 , ri , r2 = 2, 3, . . . and admissible values ofn, C and 'Q_ 

J’koot: From Equation 3.7 (p. 172) it is clear that c'P”j(C,2Z) C tP," (C>2Z) 
for all admissible values of r, n, C, and i? when n < 7-2, ri , 7-2 = 1, 2, . . . . 
The necessary statement is an immediate consequence of this observa- 
tion combined with Proposition 3.4.2. 

□ 

In view of Theorem 3.4.1 (p. 172), the preservation of tP”(C-2Z) is 
assured as in the following. 



Section 3.4. Higher Radix Preservance Spaces 


175 


3.4.2 Weights w given by the assignment 

n 

w, 6 U B{axp-'^,Q), a e 9?+, i =1,2,. n, (3.8) 

j = l 

subject to the restriction that lu^il ^ for all i, = 1,2, . n, % k, 
preserve, in all points of the discrete space r^r ^ = 2,3, . . 

n = 1, 2, ; ( € 3?+ and 3 ? G 3?"". 

The proof of this statement follows exactly on the lines of The- 
orem 3.1.2 (p. 115). In the following, the collection of preservance 
weights of the discrete space t'^r for the admissible values of r, 

n, r, C and will be denoted by and the restriction of preservance 
weights to any specific value of a G is denoted by rPn(<^). (Note 
that this notation is analogous to Pn and Pn{<y), respectively; Pn = 2 Pn> 
Pnicx) = 2 Pn(o^).) The following characteristic of the collcction of preser- 
vance weights is an immediate consequence of Theorem 3.4.2 (p. 175). 

3.4.3 For all radices r, r = 2, 3, . . , 

|2P„(a)| = |3Pn(a)| . = UP„(a)|... = n!2". 
for any a e SR" and all n = 1,2, 

Consider a number representation system whose radix, denoted by 
tr, is given recursively by 

ri = 2, 

. = 2"(r,_i -l) + 3,r = 2,3, 


r, 


(3.9) 

(3.10) 
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Then the following characterization of preservance using weights in P„ 
relating ranking and radices is interesting. 

JjceooteM 3 . 4.3 r = 1 , 2 , . , is the largest subset of the dis- 

crete subspace tr'PriC^^) that is preserved, in C^, by weights w e P„(q) 
for any a e 3i+ and admissible values ofn, ( and t?. 


Txooy: In view of the construction of the spaces 'P"(C, 32) and rP" ((, £), 
it is immediately apparent that the smallest radix c for which P" (C, 32) C 
cT’r (Ci 32) is given by Equation 3.10. Noting that every weight w € 
Pn {a), for any a e preserves all points of P" (C, 32) in jCu, (a, P" (C, 32)) 

through the discrete subset £„(q:, P"(C,^))j a study of the influence of 
points in t'P” ((> 32) \ 2^" (Ci 32) under weights w€Pn (n) shows that there 
exists at least one pair of points, say (xi,^ 2 )> such that Xj € P"(C,32) 
and Xj € \ 2 ^r (C,i^) and w-Xj = 3 ^X 2 for the speciflc weight 

w € Pn(n)- The associated breakdown of the one-one correspondence 
disallows w^Pn («) from being a preservance weight for discrete spaces 
having components from tP” (C)32) \ T’" (C,32) as well as P”(Ci:^)- It is 
also important to recognize that since the points in fP” (C, d) \ PriCdl) 
are, in general, not uniquely mapped into under inner product 
involving a weight w G Pnia), the elements of p„(a) are disallowed 
from being preservance weights for discrete spaces that are subsets of 
r2^"(C,32)\7"."(C,^)- 



Section 3 4 Higlier Radix Preservance Spaces 


177 


From the foregoing, it is important to note that though the collec- 
tions of preservance weights are equinumerous for a given dimension- 
ality n and ranking r, as stated in Proposition 3.4.3 (p. 175), the dis- 
tinctness in the preservance weights as established in Theorem 3.4.3 
ip. 176) prevents isomorphisms from being established between the 
preservance weights corresponding to the discrete spaces rPr (C^:^) foi* 
different radices r, r = 1,2,... Moreover, given an rv 6 \\w\\ in- 

creases with the radix r for all weights w € Pn(u^) and, hence, as the 
radix r increases vectors derived from the preservance weights w as jj^ 
tend to cluster around the n (unit norm) basis vectors of (Note that 
Pn(a) C 3?"^ for all € 3?+.) This bunching of preservance weights 
corresponding to the discrete spaces v'PriC^^)^ the radix r increases 
greatly diminishes the utility, from the point of view of preservance, of 
enlargements of the subsets of preserved, in under inner product 

with preservance weight w. 

In the foregoing, the analysis has been one of finding preservance 
weights given a discrete subset of 3J"\ and the discrete subsets have 
been chosen to be derived from the basic space B'' by means of scaling 
and translation. Despite the limitation of weight bunching with an 
increase in the radix of representation, preservance weights and the 
associated preservance, are not restricted to a trivial discrete subset of 
3?'^ as indicated in the following. 
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J?c£03i£K 3.4.4 Given any 
weight w € ||w|| ^ 0, 

there exists a discrete subset 
of Sft", in one-one correspon- 
dence with B”, all of whose 
points are preserved in C^. 

IPsioot; a constructive proof 
is provided. 

Consider the discrete sub- 
set of K” given by the collec- 
tion of points X, ||x|| = \/n, 
such that 

wx = tC,i = -(2^-l). ,-3,-1, 1,3, ..(2"-!), 

for some appropriate value of C € An illustration of this space for 
n = 2 and r = 2 accompanies in Figure 3.11 (p. 178): the discrete space 
is made up of points of 5R" labeled a, b, c and d. 



a 


Figure 3.11: Preservance input 
space ^7i"(l,0) 


One-one correspondence of the discrete space constructed above with 
the discrete space is obvious from the construction. 


□ 


In the following, I will denote n dimensional discrete spaces con- 
structed as in the above with a radix t numbering by (C, d), C e 
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being the scale factor, ^ € 5ft" the translation and the prescript w in- 
dicating explicitly the vector w with respect to which the space is con- 
structed. On the lines of t'P” (Ci ]2) ^ discrete space denoted by (C> 22) 
will be constructed recursively as in the following. 


>o(C,22) 

X(C.22) 

i) 


0, (3.11a) 

7/h”(C,22), (3.11b) 

>:_i(C,22)U +22)) , 

r = 2,3,..., (3.11c) 


where, t = 1, 2, . . . c"'’, are the c"’’ (ordered) points of the discrete 
set T^”_i(C, 22) \ t^r-2(C,22), n = 1,2,...; C e 5ft+ and ^ G 5ft". It is 
immediately apparent that the discrete spaces (Ci 22) and (C, 22) 

incorporate rotation of the basic space B" in addition to scaling and 
translation. The spaces ” (C, 22) and ^p" (C, :£) being constructed such 
that the given weight vector w G 5ft", ||w|| 0, preserves all points 

of these spaces in these spaces will be termed preservance input 
spaces corresponding to the weight w. For the sake of completeness, it 
is worthwhile to note that rP" (Ci22) = for all admissible 

values of t, n, C and i,; ui<o>> as indicated earlier, being the weight w 
whose elements are given by le, = c’~* , i = 1, 2, . . . ?) . 


As suggested in Theorem 3.1.2 (p. 115), the weight vector m.,w€ 5ft", 
IIhiII 7 ^ 0, will not be the lone preservance weight for the preservance 
input space tP”(C,22) C 5ft", and the collection of preservance weights 
for ^P”(C,22) will be denoted by Definitionally, the weight vector 
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w is an element of Proposition 3.4.3 (p. 175) immediately implies 
that \^P„\ is n! 2", the same as | tP„ (a)|, for any « € . Note that the 

collection of preservance weights for the preservance input space 

T'T’" (C, t?) corresponding to a given w is in one-one correspondence with 
rP„(a) for all a € 5R+. In addition, the discrete spaces 'P"{C,,'d) and 
— P" (Cl 'i) ™ one-one correspondence for all admissible values 

of n, r, C and i?. This one-one correspondence allows an extension of the 
representational characteristics discussed earlier in the context of the 
discrete input space "P" {(, to function realization situations involving 
the discrete set -P"(C,^) as the domain. However, the details of such 
an extension are beyond the scope of this thesis. In § 4.3, 1 will consider 
the issue of identification of preservance input spaces given a collection 
on input vectors. Such an identification relates to the problems of 
learning and generalization in neural networks. 


3.5 Summary 

Isolated neurons, the functional basis in the connectionist (ie, neural 
network) approach to information processing, have been studied from 
the point of view of the representation potential for signal processors. 
Preservance of discrete space, assured for all non-null weights admis- 
sible in an isolated neuron, restricts this study to functions defined on 
discrete spaces; the discrete input space preserved is, in general, a 
subset of a lattice points embedded in the Euclidean space Sft” of dimen- 
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sionality n. As preservation of the discrete input spaces is not restricted 
by the direction of the weight vector, input space dimensionality and 
radix of numbering (radices higher than 2 are considered only in the 
input space, the output space always being binary) this notion is use- 
ful in simplifying functions of several discrete variables to equivalent 
sequences on one-dimensional (discrete) spaces. 

The interplay between the issues of learning and generalization in 
isolated neurons has been studied through the simplification of function 
representation enabled by preservance of discrete spaces: while learn- 
ing addresses the issue of specifying values for weights and threshold 
given examples of association between inputs and outputs, generaliza- 
tion concerns the equally important issue of extending the function to 
the region of the input space not covered by the training set. Gen- 
eralization is not without a criterion: the criterion, specified through 
either a test set (different from the training set) or qualitative specifi- 
cations, inevitably, amounts to a formulation in terms of the number 
of sign transitions (zero-crossings) in the equivalent sequence of the 
function being represented relative to the number of sign transitions 
(zero-crossings) that are accommodated by the activation function. 

Processor representation in isolated neurons, as discussed in this 
chapter, is limited to processors (functions) that on discrete spaces pre- 
served by a weight, say w € demand an assignment which when 
projected as the equivalent sequence along the one-dimensional space 
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described by w have no more sign-transitions (zero crossings) than that 
accommodated by the activation function: in the case of sigmoidal (in- 
cluding hard-limiting) activation functions, this restricted collection of 
processors is termed linearly separable. The limited sense in which 
processor representation is provided by isolated neurons necessitates a 
study of the representational characteristics of networked ensembles of 
neurons. Preservance being restricted to discrete spaces, it is of interest 
to know the possibility of representing symbolic computation through 
neural networks; this interest stems partially from the existing result, 
established by Lippmann (1987) and others, that with at least two 
layers of neural processing all Boolean functions of several variables 
can be represented. 



Chapter 4 


L-ay&y^&cl J\l&uKa I s \gnc\\ 

Pt^ocessin0 


[I]t is worth pondering the fact . that a universal computer 
could be built entirely out of linear threshold modules. This 
does not in any sense reduce the theory of computation and 
programming to the theory of perceptrons. Some philosophers 
might like to express the relevant general principle by saying 
that the computer is so much more than the sum of its parts 
that the computer scientist can afford to ignore the nature of 
the components and consider only their connectivity. 

— Marvin Minsky and Seymour Papert 
in Perceptrons: An introduction to computational geometry, 
MIT Press, Cambridge, MA, 1990 
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Isolated neurons incorporating hard-limiting activation functions are 
capable of representing interesting discrete functions, however, suffer 
from the serious limitation of being able to represent, as indicated in 
Chapter 3, a shrinking fraction of the total number of possible functions. 
This limitation is seen in neurons equipped with sigmoidal activation 
functions too, the limitation being seen as a lack of denseness of the 
space of functions represented by isolated neurons measured relative 
to the space of continuous functions. 

Neuronal ensembles, investigated in the literature with a view to 
overcome the limitations in representation of isolated neurons, have 
principally been of the layered variety. Most common types are struc- 
tures which incorporate feed-forward connections and/or lateral inter- 
action; the details of these structures have already been presented 
in Chapter 2. (An alternate approach has been that of incorporating 
polynomial, or higher order, discriminants in the isolated neuron. In 
this approach the effect of non-linear association across layers in multi- 
layered networks is sought to be provided, equivalently, by higher-order 
interactions between the elements of the input pattern.) 

In this chapter, a logical continuation of the discussion initiated in 
the previous chapter, I will take up a study of layered neural networks. 
Neural signal processors,’^ as these processing structures are termed, 

'Apart from the issue that the neural processing paradigm allows for learning (of 
internal representations) through examples, the main difference between the neural and 
conventional approaches to information (signal) processing is that while conventional 
signal processing requires a priori knowledge of ’basis’ functions, the given signal space 
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are essentially formulated as cascades of linear combinations of neurons 
and differ from isolated neurons, the basic processing units, in being 
able to represent a larger class of processors as compared to isolated 
neurons (see § 2.3). 

Preservance of discrete spaces and the associated issue of function 
representation in neurons, as initiated in the previous chapter, leads, 
naturally, to an enquiry into the possibility of getting an insight into the 
aspect of function representation in layered neural signal processors. 

‘ In this chapter, beginning with a study of representation in single layer 
neural signal processors -a network structure subjected to extensive 
investigations in the literature - 1 establish, assuming identical weights 
in all the processing nodes, that the number of distinct processing nodes 
needed to represent a function on a discrete space is bounded above, 
weakly, by the cardinality of the discrete space. The assumption of 
identicality in the weight vectors of distinct nodes is not unrealistic in 
view of the results established in Chapter 3. 

I also establish that the issues of learning weight and threshold val- 
% 

ues and that of generalization in single layered neural signal processors 

being identified as a subset of the linear span of the 'basis’ functions, the neural signal 
processing approach seeks to synthesize the relevant 'basis’ functions by identifying a 
suitable architecture and opting to choose the weight and threshold values of the various 
participating nodes in the architecture. The given signal space, as in conventional signal 
processing, is still sought to be idenlilled ns a subset of the linear span of the synthe*' 
sized 'basis’ functions; this particular interpretation, while analytically convenient, is not 
mandatory, 'Basis’ function synthesis is a crucial component of the notion of representa- 
tion in this thesis. The manner in which the 'basis’ functions are synthesized in neural 
networks will be taken up in Chapter 6 
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relate to the corresponding issues in isolated neurons Representation 
of functions in single layered neural signal processors with the pro- 
cessing nodes having weights that form preservance weights for the 
(discrete) input space of the neural signal processor will be shown to 
reduce to a problem of function representation in single layered neural 
signal processors with identical weights in the processing nodes. 

The reduction is facilitated by the symmetries and permutation be- 
tween preservance weights discussed in the preceding chapter. Despite 
the fact that single layered neural signal processors have been exten- 
sively investigated, the notion of minimal architecture is conspicuously 
absent in the presentations of neural network based information pro- 
cessing. I propose a notion of minimal neural signal processing architec- 
tures, the criterion of minimality being related to that of admissibility 
of (preservance) weights introduced in § 3.3. 

A study of the architecture of single layer neural signal processors 
shows that while these processors are able to represent all discrete func- 
tions on discrete spaces with hard-limiting activation function in the 
single layer of processing nodes and the space of functions represented 
is a dense subset of the space of continuous functions when sigmoidal 
activation function is employed in the processing nodes, the number 
of distinct processing nodes demanded to achieve the representation is 
unmanageably large. More precisely, the representational complexity 
is of exponential order. 
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The possibility of representing functions in multi-layered neural 
signal processors with a complexity smaller than that achievable in 
single layer has been studied; the study is, admittedly, of a preliminary 
nature in view of the analytical intractability introduced by the non- 
linear nature of the activation functions. In this study too, the notions 
of preservance of (discrete) input spaces and preservance weights are 
maintained to conform to the theme initiated earlier. 

A preservation of the uniqueness and relative order between the 
input space points in the discriminants of the neurons by the weights 
associated with the discriminant (ie innerproduct) function is a charac- 
terization of the nature of ’internaF representations effected in neural 
networks. The issues of learning the weights of the first layer in a 
neural network (assumed to operate on a preservance input space) is 
essentially one of identif3dng a preservance input space to the collection 
of inputs described in the training set. I have suggested an approach 
that would aid an identification of a preservance input space given a 
training set. 

Realization of discrete-valued processors on discrete spaces coupled 
with the interpretation ascribed to preservance input spaces encour- 
ages a study of the possibility of realizing mappings between symbolic 
spaces Algebraic properties being the only available characterization 
of symbol spaces, I establish an algebraic equivalent of the notion of 
linear separability: a dichotomy over a symbol space, itself embedded 
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in a semi-lattice, is linearly separable if each member of the partition 
induced by the dichotomy on the (input) symbol space is a semi-lattice. 

Neural signal processors with a single layer of decision making are 
studied in § 4.1: this study is a continuation of function representation 
initiated in the previous chapter. The representational issues in multi- 
layered neural networks, in particular, the possibility of reducing the 
representational complexity through layering in feed-forward networks 
is taken up in § 4.2 (p. 201). § 4.3 ip. 208) focuses on identifying 
preservance input spaces appropriate to the collection of inputs in the 
training set. Symbolic computation in neural networks is discussed in 
§ 4.4 ip. 218) in preparation for a study, in Chapter 5, of the abstract 
nature of the representational paradigm in neural networks. 


4.1 Representation in Single Layer Neural Signal 
Processors 

Neural signal processors with a single layer of decision making, as 
described in Chapter 2 (see § 2.3), are described, with a minor revision 
of notation, as 

m m 

f)(£) - ^ 

* = 1 1=1 

In the above equation, x denotes the input patterns, ^ e 3?", rn is the 
number of processing nodes in the (single) layer, m = 1, 2, . . . ; are the 
weights associated with the processing nodes, m, £ 3?”, i = 1, 2, ... m; 
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are the thresholds associated with the processing nodes, dx € 3?, 
? = 1,2,. }it; a denotes the activation function, r, are the coelficients 
of the linear combination of neural decisions, -Ui G 3?, z = 1, 2, . . r7i, 6 is 
the bias, 9 e dl, and ti is the neural signal processor response, t; G 

Given the relation between weights, in isolated neurons, and certain 
discrete spaces (of that are preserved entirely in a one^dimensional 
space (described by the chosen weight) as discussed in Chapter 3, it is 
of interest to investigate the representational potential of single lay- 
ered neural signal processors under such preservance weights. As the 
discrete subset of preserved and the one-dimensional sub-space 
which accommodates the preservation points are decided in each neu- 
ron by the corresponding weights, i being the index of the neuron, 
it is not feasible to carry out a general analytical discussion for all the 
situations of weight realization. 

I will begin by considering the specific case wherein the weights of 
all nodes are identical, differences in the decision mechanism at the 
various nodes being provided by threshold values. Without any loss 


^This approach, of using linear combinations of neural responses ns the response of a 
processor is in vogue in the literature (see eg, Rosenblatt, 1958; Albus, 1975; Ilecht- 
Nielsen, 1987b, 1987c; Caudill & Butler, 1990) and is also evident in the notion 
of instar-outstar neurons (Grossberg, 1982). It is important to note that though all 
the processing nodes employ the same activation function, the decisions at these nodes, 
based on the specific choice of weights and threshold, need not all be identical. Processing 
structures of the kind described in Equation 4.1 have been extensively investigated m 
the literature and, as stated earlier, have been shown (cf, Cybenko, 1990, Homik, 
Stinchcombe & White, 1989) to represent the space of continuous functions with any 
desired accuracy. 
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of generality, this situation is considered as a discussion of the repre- 
sentation, by single layer neural signal processors, of functions defined 
over the discrete subset — 'P"(C)32) = (C>^) C 5?'* with appropriate 

admissible values for r, C and £ (Note that the radix of numbering does 
not influence preservance except for accommodating a larger number 
of discrete points of as the radix increases.) 

IPsoyossTJOx 4.1.1 If the weights of all processing nodes are identical 
in a single layer neural signal processor, ie, w, = w Vi, i — 1,2, m, 
the component functions y„ t = 1,2, . m, are all defined on a common 
discrete subset £ju(||m;|| ^ 

IPKoyosaj-JOK 4.1.2 Functions realized as 0 ( 2 :) in a single layer neural 
signal processor wherein weights in all processing nodes are identical 
are a linear combination of the univariate functions, over corre- 
sponding to the functions of the individual processing nodes. 

(PxoTOSST’JON 4.1.3 Function realization in single layer neural signal 
processors wherein the weights of all processing nodes are identical is 
influenced only by the location of thresholds di, coefficients of linear 
combination i = 1, 2 , . . m, and bias 0. 

These statements follow immediately from Proposition 3.2.1 (p. 138) 
and the processing scheme introduced in Equation 4.1 (p. 188), Seeking 
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the structure of functions realized over given a 

weight w G Q, common to all processing nodes of a single layer 

neural signal processor, the following are easy to establish. 

4.1.4 Sign transitions of each component function y^, 

1 = 1,2, , m, under superposition induce level transitions in t). 

This statement is a direct consequence of t)(^) being realized as a linear 
combination of the component functions yt (x) for all x € and the 
addition of a bias 0. 


4.1.5 The location of every level transition in t)(x) is 
inherited from a sign transition ofyt {gf) for some t, t = 1, 2, . in, 

TcR.otPOSSO'SOi^ 4.1.6 Each sign transition ofyt, for all t = 1, 2, . . m, 
contributes to a single level transition oft){x). 

Proposition 4.1.5 is a simple consequence of superposition and Proposi- 
tion 4.1 6 follows from the notion of a function. 


TiKSorRSM 4.1.1 The number of level transitions in the discrete se- 
quences over £^(||t£|| ,—7^” (C^^))^ for any and admissible values 

^Note that the discrete space — "PJ? (C»2^) consists of unions of binary collections, each 
of which IS preserved by the (non-null) weight w € 3?” in As the preservance weight 
w IS specific to this preservance input space, not only in direction, but also in magnitude, 
the collection of projection points, are indicated to be influenced by the norm ||t£|| of 
the preservance weight w. 
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of% T, C and ^ resulting from a realization of single layer neural signal 
processors as a linear combination ofm, 7/1 = 1, 2, , decision elements 

based on sigmoidal (including hard-limiter) activation functions and 
identical weights in all processing nodes does not exceed the number of 
component decision functions, ie, m. 

This statement follows from Proposition 4.1.5 and Proposition 4.1.6. 
Note that the number of level transitions is independent of the bias 6 
and || 2 iu||. Immediate implications of this theorem follow. 

4.1.2 A two layered network of neurons constructed as 

7/^2) (x) = (T(ofe)) , for all X € 2 ?) C 5?^ for any w £ 

and all admissible values of r, C and ^ exhibits no more than m sign 
transitions in the discrete sequences over (C^l?)) represent- 

ing bipolar bivalent functions (ie, bipolar dichotomies) over 
when the activation function cr is sigmoidal. 

!T:n:£03>t£M 4.1.3 No more than = \'PJ){(^d)\ linearly sepa- 

rable nodes are necessary in the two layered neural network of Theo- 
rem 4.1.2 to realize all bipolar bivalent functions over 

This statement, however, is not unknown in the specific context wherein 
r = 1, ( = 1 and t? = 0, ie, 7^i^(l,0) ^ bipolar bivalent functions 
(over 7^1^ (1,0)) are termed Boolean functions of n binary variables. For 
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example, Lippmann (1987) and Hertz, Krogh & Palmer (1991) 
hav(> f*jv(‘n diffta tail. i io argiiuuaiia in HuppcHl oi' Uio claim that 

a linear combination of linear separable nodes followed by a hard- 
limiting comparator is adequate to realize all Boolean functions of n 
binary variables. 

7'D<eooiEM 4.1.4 Neural signal processors described by the functional 
form in Equation 4.1 realize all complete dichotomies on pro- 

vided sufficiently many processing nodes are available. 

IP:roo 3“: Consider a single layer neural signal processor given by: 

l~^?(C,t?)|-l 

o(£) = XI v^yi{s)-d, 

X=1 

yi{x) = (r{:wx-ef), 

where, e ^0”(||2c|| , ^ . . . |^P”(C,22)| - 1. In this struc- 

ture, which assumes the maximum number of processing nodes, the ef- 
fective contribution of processing nodes representing trivial functions, 
ie, functions that are constant for all input values, is assumed to be 
specified by the bias term 6. 

Noting that for each represents a linear separable dichotomy 
on “Pr (CjJ 2.) Sind the ordering imposed in the assignment to 0^, in the 
sense that 0i < 02 < • . ensures that n is expressed as 

a linear combination of all distinct (except for complements) linearly 
separable dichotomies on ~PJ?(C^ 32)> given any dichotomy, expressed for 
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notational convenience as the array tj,'* 


0 = [')(2j];=x 




such that 


MZi <UlS.j for oil ?■ < J, Uj = h 2, 


-7’r(c,22)i-i, 


the functionality of the single layer neural signal processor can equiv- 
alently be expressed as: 

tj = yv, (4.2) 

where, v=[vi,V 2 y ■ 'Op-p"(c, 3 y|-i] ^ ond y is the matrix 


y = [Vj (2.)] 


FP?(C,iS)IFP?(C>3l)l-i 

1=1 j=i 


The matrix y is independent of the given dichotomy and, hence, a so- 
lution to the linear system indicated in Equation 4.2, in an attempt to 
find the minimum norm solution for v, would be 


u= 


where. A* is the Moore-Penrose pseudo-inverse {cf, Penrose, 1955; Ben- 
Israel & Greville, 1974) of a matrix A. 

From the structure of y it is immediately evident that is non- 

« 

singular and, hence, its Moore-Penrose pseudo-inverse is assured. 

□ 

^ ^ is the equivalent sequence of the given dichotomy on for a non-null weight 
we 01". 
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Comparing the representation provided by network structures described 
by Equation 4.1 (p. 188) witli tliat provided by a two layer neural net- 
work suggested in Theorem 4.1.2 (p. 192), it is immediately apparent 
that the activation function, operating on the response t), serves to 
render the bias term 6 of neural signal processors redundant. 

Though the above statement has been established with the notion of 
sufficiently many processing nodes interpreted in the sense of a finite 
upper bound on the number of processing nodes, it is never the case that 
all processing nodes, each representing distinct linearly separable bipo- 
lar bivalent functions over will participate in the s 3 nnthesis 

of every dichotomy on Noting from Proposition 4.1.5 (p. 191) 

and Proposition 4,1.6 (p. 191) that the number of decision elements 
in a single layer neural signal processor is identical to the number 
of sign-transitions in the number of processing nodes required to 
represent a given dichotomy can be made smaller than |~PP(C 5 32)| 1 

by an appropriate choice of common preservance weight w. This con- 
sideration, being similar to the notion of admissibility of weights to a 
given dichotomy, together with the notion of architecture discussed in 
Chapter 2 (see § 2 2) prompts the following. 


4.1.1 Given a function f : di- 
chotomy S* ^Trie'S.) {C-? C+}^ non-null weight w € the 

architecture of a single layer neural signal processor, expressed as the 
combined specification of the number of nodes m in the single layer 
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of processing and weights and thresholds in the individual processing 
nodes, w„ i = 1, 2 , . . m, bias 9 and co-efficients of linear combination 
V that realize f (D), is termed minimal for f (D) if a realization of f (S)J 
in that architecture cannot be achieved with fewer than m processing 
nodes. 

Having established the nature of relationship between the function 
t) realized by a single layer neural signal processor and the component 
(decision functions) j/;, i = 1, 2, . . . m, in the restricted context of all pro- 
cessing nodes having identical weights, it is imperative that attention 
be given to the crucial problem of learning, ie, an automated specifi- 
cation of values for the common weights w, threshold 0,, superposition 
co-efficients i = 1,2, . .m, and bias 0 In view of the fact that i) is 
a point-wise addition of component functions i/,, i = 1, 2 , . . . m, over the 
support £ui(||w|| ,— 7^"(C,^)) as suggested in Proposition 4 1.1 (p. 190) 
and Proposition 4.1.2 (p. 190), specification of the weight w will in- 
volve the admissibility criterion discussed in § 3.3 (see Theorem 3.3.2 
(p. 161)). However, this criterion will be applicable only over the com- 
ponent processing nodes and not on the overall function synthesized by 
the neural signal processor as a consequence of Theorem 4.1.4 ip. 193). 

Learning (with generalization), of minimal architectures, in view of 
the preceding definition is formulated as the following search problem: 

, E l/fe+i)-/(3:.)l- (4.3) 
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In the above expression, ordering in the points of is in the 

same sense as discussed in the proof of Theorem 4.1.4 (p, 193), / is 
the desired function on available through examples, that is 

to be represented in the single layer neural signal processor and the 
collections {w^ } and {6,^ } are with respect to the m processing nodes in 
the ensemble. 

In addition to the above criterion, the search is based on a crite- 
rion that evaluates the goodness of approximation in the sense of an 
appropriately designed norm on the space of training samples. This 
term, however, has not been indicated in Equation 4.3 as Theorem 4.1.4 
(p. 193) assures exactness ofrealization particularly when hard-limiting 
activation functions are used. In light of the discussion, in Chapter 3, 
the procedure for search of function representation with preservance 
weights and on preservance input spaces is of the following nature. 

T'HEOOieM 4.1.5 Learning of a given non-trivial bipolar dichotomy 
% — > single layer neural signal processor with m processing 

nodes and hard-limiting activation function involves the sequence of 
three distinct steps: 

1. Identify, for a given e > 0, a suitable discrete subset 

mo ^ 0, r = 1,2, . . , C ^ 5 R 4 . and corresponding 

to% C |7^| < 00 , such that 

max min \x* — < 6 
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Assign to the weight w any element in -o and correspondingly 
assign to m, the number of processing nodes in the single layer neu- 
ral signal processor^ the number of level transitions in the (equiva- 
lent) discrete sequence over {\\wq || , ^)) representing the 

given dichotomy 2). 

2, Order the locations of the level transitions, and for i — 1,2,.. m 

assign the threshold Oi any element of — o0(||wioll ^ 
appropriate value of 3 , j = 1,2, . (C^22)|“ 

1), where ^6)(||^oll )C^)0) contains the location of the ith level 
transition. 

3. Assignments to the superposition coefficients z = 1, 2, . . m, and 
bias 6 follow the minimum norm solution to a system of linear 
equations: 

= (y^y) f'l, 


where, 

y 1 

iT 1 

and A* is the Moore-Penrose pseudo inverse of the matrix A £ 

The approximation in step 1 of the above theorem can always be 
assured in view of the denseness of the discrete space — P"(C)32) in 
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the interval C 4- (Yn-fj I + where the scale 

Tl 

factor a = ||i£|| ( ^ ^ as r tends to 4-oo. (Note that a is the scale 

t=i 

factor of the (non-null) weight w in relation to a preservance weight of 
the discrete space (C, t?)-) The steps involved in the identification of 
the preservance input space (C, 3?) will be discussed in § 4.3 {p, 208). 
The above theorem is more in the nature of a statement assuring the 
existence of a representation. 

As seen in the case of admissibility of preservance weights to a given 
dichotomy, minimal architectures for a given dichotomy are not unique. 
The chief reason for such a non-unique representation in neural signal 
processors can easily be traced to Proposition 3.2.13 (p. 152). (Note 
that non-uniqueness in the sense of permutations of the co-efficients 
w with a corresponding permutation of weights and thresholds of the 
processing nodes are not being considered.) It is of interest to note that 
non-uniqueness in the minimal single layer neural signal processing 
architecture, due to the multiplicity of preservance weights represent- 
ing a given dichotomy even at the level of the equivalent sequences over 
M being a non-null weight in cannot be resolved by the criterion 
of generalization. 

Representation of dichotomies in single layer neural signal proces- 
sors, discussed till now in the restricted case wherein weights of all 
processing nodes are identical, when extended to a situation permitting 
processing node weights to be members of^Pn subject to the restriction 
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that all processing nodes employ weights of identical norm implies the 
following. 

!r:>C£oai£M 4.1.6 A problem of representing a given bipolar bivalent 
function [C-^ C+L nonmull w e in a single 

layer neural signal processor with m processing nodes, m < nl2^, and 
weights € —Pn> i = 1,2, such that ||t£j| = z = 1,2, 

1, is equivalent to the problem of representing 2) in a single layer neural 
signal processor of in nodes wherein the weights of all processing nodes 
are identical, the common weight being any ofw^, z = 1, 2, . , m. 

IPrROOsr: Noting that 

a) the collection of weights described by -Pn, |k.|| = IIWlll, i = 
2,3, .. . |— P„|, a space isomorphic to Pn{ot), for any a G 3?+, as 
indicated in § 3.4, are described in terms of any specific weight, 
in that space, through permutations similar to those indicated in 
Theorem 3.1.4 (p. 124), and 

b) the procedure for determining v, the co-efficients of linear combi- 
nation of the responses of the m distinct neurons is not dependent 
onyt,i = 1, 2, ... m, being linearly separable, 

the statement is immediately apparent. 

□ 
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4.2 Representation in Muiti Layer Neural Signal 
Processors 

Single layered neural signal processors, shown to be adequate in rep- 
resenting multivariate functions by Hornik, Stinchcombe & White 
(1989), Cybenko (1990) and in the preceding discussion (the discussion 
was restricted to the realization of bipolar dichotomies on preservance 
input spaces), suffer from the requirement that the number of process- 
ing nodes be exponentially dependent on the dimensionality of the input 
space as well as the number of disconnected components in the inverse 
images of the dichotomy under consideration. This exponential depen- 
dence is directly related to the number of allowed sign transitions in 
the sequences over £^(||w|| '^"(Cili)). for any w e -op„, ||w;|| = ||i£(,||, 
with Wg € 3?", IIwqII 7 ^ 0, and inputs restricted to the preservance input 
space — P" (C, ]2) C 5R" for admissible values of r, ( and i?- 

Representation of functions in single layer neural signal processors 
will exhibit .such an exponential dependence rc'gardless of the nature 
of activation functions which accommodate fixed finite number of sign 
transitions. It is thereby imperative to investigate approaches that 
would aid a reduction in the representational complexity of dichotomies. 
The representational complexity is to be interpreted as the number 
of distinct basic processing nodes that need to committed to achieve 
the desired representation relative to the number of similar (identical) 
nodes required in a minimal single layer neural signal processor. 
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Multi layered neural signal processors of the strictly feed-forward 
variety, in the light of Equation 4.1 (p. 188), are defined by the following 
operational schema (as distinct from the definition indicated in § 2.2): 

J/jO) (a) = = 1, 2, . . .nil, (4.4a) 

y% (x) = ) > 

= 1,2,. .»)^,f = l,2,. k, (4.4b) 
- 6, (4.4c) 

for some a priori specified values of k and 7ni, k,me = 1,2, ; ( = 

1.2. . ..fc. In this equation, as already indicated in Chapter 2, 

is the weight, 0{ll^ is the threshold, and indicates the response 
of a processing node {ie, neuron) in layer £, the node index being 
and is the response of a neural signal processor formed by linearly 
combining the outputs of a neural network of £ layers. 

In view of Theorem 4.1.6 (p. 200) it is not difficult to visualize that 
the response of a two layer neural signal processor wherein the weights 
in the processing nodes of the first layer belong to the preservance 
weight space subject to the restriction that || = ||u)o||, = 

1.2. . . .7711, Mo € SR", IImoII 7 ^ 0, is indeed equivalent to a superposition 

of TO 2 functions, each of which is a bipolar bivalent sequence on the dis- 
crete space £a^(||Moll (Note that £ji,„(||Moll (C,3^)) = 

^liidlMlI (C. i)) as (C, 1) = ±) for all w € such that 

IImII = IIMoII-) The bipolar bivalentsequences(on £ 3 i,^(||ufoll ,^'Pr(Ci3!2))) 

are responses of appropriate single layer neural signal processors and 
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the number of distinct sign transitions in each of these sequences is no 
!nr)r(> llinn iht' nuinbc'i* of pi oroHHiiig iiodt'H in (hn flird- laycu*; (Jiin 
observation results as a simple consequence of Theorem 4 1.1 ip. 191). 

The follo’wing characterization, in view of the above mentioned fea- 
ture, about the equivalent sequences representing functions realized 
by two layer neural signal processors is worth noting. 

7‘KEOOieM 4.2.1 The number of distinct level transitions in the discrete 
sequences over £^Q(||iroli for any Wq e ^ 0, n = 

1,2, r = 1,2,. C ^ 3^4- ond resulting from a two layer 

neural signal processor wherein the weights corresponding to processing 
nodes in the first and second layers are given by II = 

= 1,2,. .7nu and £ dV\ = 1,2,. is bounded 

above by 

min{m2mi, \^V^iC,'d)\ — 1 ) 

Since this statement is on the same lines as Theorem 4.1.1 (p. 191), 
a proof is not required. A generalization of the above theorem to the 
case of multi-layered neural networks follows. 

7j<eoji£M 4 . 2.2 The number of distinct level transitions in the discrete 
sequences over £^^(||zroll (Cl?)) for any Wq £ 3?”, HwoH ^ Q, n = 

1,2, r = 1,2, C ^ ond j? € 9?^ resulting from a k layer 
neural signal processor, fc = 1,2,..., wherein the weights corresponding 
to processing nodes in the various layers are given by € —fPr,, 
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= IKIl = 1 , 2 ,... mi, and S 3?", j^‘'> = 1,2,... nit, 
£ = 2 , 3 , ,.k, is bounded above by 

k 

i=i 

The above two theorems suggest that a study of representation in 
multi-layered neural signal processors is analogous to a corresponding 
study in single layer neural signal processors, though with processing 
nodes that incorporate activation functions that induce multiple level 
(sign) transitions. An immediate implication is that the approach to 
learning suggested by Theorem 4.1.5 (p. 197) and Theorem 4.1.6 (p. 200) 
is applicable, though with appropriate changes, to the case of multi- 
layered neural signal processors also. 

Recall Equation 4.4c. While the procedure for determining the co- 
efficients Vj, j = 1,2,.. mjb, and bias 6 in a k, k = 1,2, ., layered 

neural signal processor is the same as that indicated in step 3 of The- 
orem 4.1.5, the matrix y, in the case of multi-layered neural signal 
processors, consists of sequences over (||woll (C,^)) with mul- 
tiple sign transitions, {wq is assumed to be a preservance weight for 
the discrete preservance input space.) Note that these sequences cor- 
respond to neural responses of a (vector-valued) neural network of k 
layers (for a k layered neural signal processor). The sequences are rep- 
resentative of the bipolar bivalent (decision) functions over 
that are linearly combined (with coefficients Vj,j = 1,2, . . .mk) to real- 
ize (approximate) the desired function. 
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To facilitate a procedure for determining the values of the coeffi- 
cients (r) of linear combination I will denote by ilie collection of 

sequences (over £1^ (||ieo|| ,^'Pr (Ci^))) corresponding to the decisions 
of a k layered neural network with no more than sign transitions. 
Theorem 4.2.2 (p. 203) states an upper bound for the value of 3^1- Sym- 
bolically, 


3kJk) - 






1-1 




such that ^ Ij/jfi) fe+i ) - 2/] w UJI < 3k 


2=1 


Denote by the array the following: 


Then, in a manner analogous to step 3 in Theorem 4.1.5 (p. 197), a 
solution for the coefficients of linear combination (v) and the bias (^) is 
given by 


V 

- whore y = 

}kyW 1 

e 

\ / - 

1 

IH-* 

J 


Note that given a value of 3 ^ , subject to the upper bound suggested by 
Theorem 4.2.2, the matrix is completely specified for the (discrete) 

preservance input space This matrix is not dependent on 

the specific values assigned to the coefficients of linear combination 
(v) and threshold (ff). An immediate implication is that each row of 
skyW^ a sequence over (IImoU states the collection of 

assignments, over — for ^ distinct node in the A't-th layer. 
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Recall Proposition 4.1.5 and Proposition 4.1.6. In the context of 
multi-layered neural signal processors, these propositions state the de- 
pendence of the number and location of level transitions in the response® 
on the (number and) location of sign transitions in the decisions 
for all values of f, f = 1, 2, . . . fc. (Note that the response for£^k is 
defined, analogous to Equation 4.4c, as a shifted linear combination of 
the decisions j/y<) , = 1, 2, . . nif .) 

On reverting back to the situation wherein the matrix is com- 
pletely known given and it is of interest to know the 

converse of Proposition 4.1.5 and Proposition 4.1.6, te, the dependence 
of the level transitions, in number and location, of the decisions 
on the (number) and location of sign-transitions in the responses i/®), 
£— 1,2,. .fc-1. The converses are similar in nature to Proposition 4. 1.5 
and Proposition 4.1.6, ie, the location of every sign transition in 
for all = 1,2, . , is inherited from a level-transition of , 

for some = 1,2, ..m<, and every level transition in for all 
= 1,2, . . mi, contributes to at most one sign-transition in 
for some = 1,2,... 

Thus, a knowledge of the matrix provides an insight into the 
locations of level crossings in the responses of the k — 1-th layer. 

A repeated application of the above approach to elicit the locations of 


^References, in this section, to tj and y, with appropriate layering and node indices, 
are to be interpreted as the equivalent sequences corresponding to these functions on the 
discrete space (||wo|| ,^P?(C,tl)) 
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level transitions in the responses of the t — l-th layer with a knowledge 
of the locations of level transitions in the f-th layer suggests a scheme 
which is analogous to the back-propagation procedure common in the 
literature. The specific details of the scheme of back-calculation of 
the locations of level transitions are not presented in this preliminary 
discussion. Each back-calculation step necessitates a foreknowledge 
(or a judicious selection) of the largest number of sign transitions that 
will be tolerated in the decisions of the corresponding layers. 

From the preceding discussion it is amply clear that it is not im- 
possible to realize, in a multi-layer neural signal processor, a given 
dichotomy on Wq e H^oll ^ 0 , and admissible values 

for 71, r, ( and with fewer total processing nodes than necessary 
in a single layer neural signal processor as given any integer, say 
771, 771 = 1, 2 , . . . , it is not infeasible to choose k integers vii,77i2 . . . nifc, 
m£ = 1,2, . . m - 1, ^ = 1 , 2, . . . fc, A: = 2, 3, . . . , such that 
whereas mi < m. The reduction in the number of processing 
nodes required in a multi-layered neural signal processor relative to 
the requirement in a minimal single layer neural signal processor can 
easily be traced to the multiplicity of the sign-transitions in the bipolar 
bivalent sequences (over £ 3 ^ {\\wq || , (C, :^))) representing functions 

(dichotomies) over ^P”(C, 22 )* the limited scope of this thesis, this 
aspect will not, however, be elaborated. 
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4.3 Learning of Weights: Identification of 
preservance input spaces 

In Chapter 3, the representation of functions (on discrete spaces) in 
isolated neurons been shown to be influenced, in the sense of a simpli- 
fication, by a choice of weights that are related to the (discrete) input 
space: the specific relation considered is that of preservance. Analogous 
to the representation of functions in isolated neurons, the representa- 
tion of functions (processors) in multi-layered neural signal processors 
is influenced by a choice of preservance weights in the first layer.® 

The role of preservance, as suggested by the definition, is to main- 
tain uniqueness and orderings between the distinct points of the input 
space: such a uniqueness can, however, be established only when the 
input space is discrete. In multi-layered neural signal processors, a 
choice of preservance weights is tantamount to a representation of the 
input space. With this interpretation, it is not incorrect to suggest that 
the processing in isolated neurons, and their ensembles, involves an 
internal representation of the (relevant) input space and the processor 
realization is based on this internal representation. 

In this section I will consider the issue of identifying a discrete 
space given Ti, the inputs contained in a training set, such that the 

®While it is not infeasible to discuss a situation wherein the weights of each layer, 
in a multi-layered neural signal processor, are preservance weights corresponding to the 
inputs space presented to that layer, such an investigation has not been attempted in 
this thesis. 
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discrete space is a preservance input space corresponding to some non- 
null \veight vector in 3 ?^. Equivalently, the problem is one of specifying 
the weight of an isolated neuron (or the common weight in the first layer 
of a multi-layered neural signal processor) such that a discrete input 
space which embeds an approximation (in the sense of the Euclidean 
norm) of % is preserved by the chosen weight. 

Recall that preservance input spaces are denoted by 
this notation n is the dimensionality of the Euclidean space in which 
this discrete space is embedded, r is the radix of numbering, ie^ the 
number of distinct (discrete) values that are allowed in each element 
of the n-vectors (ze, n-tuples of numbers) in the input space, r refers 
to the degree of ranking. C is the common scale factor and 2? is the 
common translation that every point in the input space are subjected 
to* the scale and translation are measured in relation to the elements 
of the collection of binary vectors in w indicates the direction" 
{ie, u^) of a preservance weight^ of the discrete collection of points in 
An identification of a preservance input space given % involves a 
specification of all the above components. 

In the following discussion, I assume that the dimensionality n of 
the Euclidean space from which the training inputs are drawn is known 
a priori. A few comments are in order about the nature of the discrete 
set % in relation to the discrete space 7P” (C, ^), for some appropriately 

^Note that there are n!2” distinct preservance directions for every n dimensional 
(preservance) input space 
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identified values off, C, t, r and w (the dimensionality n is given). Recall 
the construction of tho discroto space (C. ui ^ 0.4 (Rrniation 3 1 1 
(p.l79)). For any non-null w; € S?", 7 R"(C,i'^J>Ht'‘H"val('nttoaVotalion’ 
(actually permutation) of cR" (Ci Q) followed by a translation. 

The operator permuting (rotating) xK (C. Q) to tR" (C, 0) is governed 
by the linear transformation that maps to the weight w the preservance 
weight w<e>, for an appropriate value of c, c = 0, 1, .7?!2" - 1, that 

is closest, in the sense of the Euclidean distance, to w in cP„, the 
class of preservance weights of (C. 0), subject to the restriction that 

l|3a<e>ll = llllill- A translation of all points in (CiQ) by ^ results in 
the collection denoted by 

Note that the discrete space cP" (CiQ) i® obtained as a union of 
certain scaled and translated collections of binary vectors: the scale and 
translation factors operating on the component binary vector collections 
are described in Equation 3.7 (p. 172). Each collection of binary vectors, 
denoted by R”(C,i?), for a scale factor C £ and a translation H e !R'‘, 
has the algebraic properties of a lattice (Boolean algebra)® under a 
suitably identified pair of binary operations operating on the collection.® 

Though not all unions of lattices (algebras) result in algebraic struc- 
tures that are lattices (algebras), the specific scale factor and transla- 

®For this reason the term Boolean space for the collection of binary vectors 0" and its 
generalization B"(Ci^) is not inappropriate 

®The most common place examples are the logical operations of conjunction and 
disjunction. 
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tions chosen in the construction of r'PP (C»Q) ensure that this discrete 
space is also a lattice (algebra): this aspect will be considered, more for- 
mally, in the next section. Since is obtained from r^r (C^Q) 

by a norm preserving permutation^ ° followed by a translation, the al- 
gebraic characteristics of (CjQ) ^^e inherited by This 

implies that has the algebraic structure of a lattice (alge- 

bra), however, it is not essential that binary operations on r'Pr (Ci S) 
inherited identically by f'Pr(Cy^)- 

A specification of a weight w in an isolated neuron such that w 
preserves % in implies that with appropriate values for r, r, C and 
]9y the preservance, by weight w, is applicable not only to the points in 
the discrete set 71 but to the entirety of the discrete space ^P”(C>22)* 
Note that for the specification of the weight and other parameters to 
be meaningful, 7^ C As the discrete space f^r ((,]!), given 

Wy r, r, C and 2 ?, is a unique subset of the Euclidean space the term 
identification, in this discussion, is not inappropriate. 

The applicability of preservance, given a set of training inputs to 
discrete sets that are not smaller than 7^ , suggests that generalization is 
not restricted merely to an extension of function specification over input 
regimes different (or larger) than those stated through a training set. 
(In § 2.2 and in the discussions of Chapter 3 and the previous sections 
of this Chapter generalization is interpreted as function extension.) As 

^^This is the case as the permutation operator mapping a preservance weight 
a (non-null) weight u; is restricted to ensure || = ||hII* 



212 


Chapter 4 Layered Neural Signal Proressing 


a consequence of generalization, the algebraic characteristics aiding a 
representation of the inputs, limited to the subset %, are extended to a 
larger subset (of R"): this extension is based on a preservance of % in 
the (internal) representation provided by the weight w. 

Given a training set, in particular the set %, the translation that 
all points in the discrete set (C, Q) are to be given in order to derive 
the set fP" (C. li) is given by the procedure indicated in Figure 4.1. The 
search for the translation vector id is based on the observation that all 
points of the basic collection of binary vectors 6"(C,Q), for all values of 
( e 5t+, lie on a (hyper) sphere of radius VnC- Step 2 is equivalent to 
the following constrained search operation: 

7;(£^.) = max7;ni3"(C,Q), 

forsomeij e Tuj = l,2,...|T[|,subjecttotheconstraintthat7^(£^j) ^ 0. 
(Note that the set of training inputs is assumed to be non-empty.) The 
translation i? is identified, in step 3, with the centroid of the set Ti{Xj). 

Figure 4.2 (p. 214) lists a procedure for determining the radix i, 
scale factor C, and ranking index r that enable a construction of the 
discrete space ^P”(C)32) given the collection of training inputs %: the 
procedure for determining the translation i, is invoked to simplify the 
identification of the parameters of the discrete space fP^(C,22)- Step 2 
accomplishes the centering of the training inputs. The collection of 
centered training inputs is denoted by 7^ . In step 3, a partitioning 
is induced on on the basis of the norm of the vectors in the collec- 
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Given n, 7i = 1, 2, . . . , and 7^, a finite subset 
do 

1 Choose any member, say , ^ = 1, 2, . * |7^ |, from 71. 

2 Construct the set 71 (x^ ) C 71 such that 

Tt{Xj) = I X G and ||x^ — x|| = max ||x^ — x^|| 

3. Set to 1 ? a value in 3^"" which ensures that Vx € 71 (x^) 
11^ “ ^11 = a constant, 

done 


Figure 4.1: Procedure for determining i? 


tion, and the discrete space % C ^ is constructed by choosing one 
(representative) input vector from each member of the partition on , 

The identification of the radix r, scale factor C and ranking index 
r are based on certain observations on the (discrete) preservance in- 
put space Recall the construction of the preservance input 

space (cf, Chapter 3). (Figure 3,5 (p. 134) illustrates a preservance input 
space of 2 dimensions and ranking index 3 with a preservance weight 
in Pa-) As the preservance input space is constructed by associating to 
each ’peripheraF input point (the (ordered) points of the discrete set 
vK-iiC.i) \ n = 1,2,...; C € St-I- and^ G in the con- 

struction described by Equation 3.7 (p. 172)) an n-dimensional scaled 
(and translated) collection of binary vectors, the specific values of the 
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Given 7 i, 7 i = 1, 2, . , and 7^, a finite subset of 

do 

1 Find 7? appropriate to Tx . 

2. Construct the finite set C from the set Z with a trans- 
lation of - 1 ?, kyZ' — {x-±\x£Z} 

3. Construct the finite set fx from z' in the following manner* 

Zt = \x^\xi= arg min ||x|| 

I 

and \\x^\\ > = 2,. \Zx \ 

4 Construct the one-dimensional function, say /, on 71 \ {x ^ } 
such that f(x^) = ||xj| - j = 2, |7;|. 

5. Set r = 2 if the function / is unimodal, else set t to be the 
same as the number of modes of /. 

6. Let /I be the width ofthe largest cluster in /. Set( = .d(F'-l). 

7. Let <5 = min f{x). Set r = 

£eT,\{aj} 

done 


Figure 4.2: Procedure for determining r, C and r 
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scale factor and the translation (governed by an exponential function 
of the ranking index r in a base given by the radix r) order the vectors 
in 7, to be separated, in the norm, in a non-uniform manner. 

Figure 3.9 ip. 149) illustrates the nature of non-uniformity in the 
spacing, in the sense of the norm, of the vectors in 71 : note that, by 
definition, the projection of a vector x along another vector w (from a 
common innerproduct space) is proportional to ||$.||, the norm of x. The 
function constructed in step 4 of the procedure in Figure 4.2 allows a 
relative evaluation of the intervals between adjacent projection points 
of the vectors in 7 in the linear subspace for some non-null vector 
we 3?". 

Notice that the discrete set 7 is constructed in such a manner as 
to retain only one point from every scaled collection of binary vectors 
whose members are found in 7^ . This construction ensures that the 
number of modes in the function / constructed in step 4 is no more 
than the smallest radix r necessary in describing the vectors in the 
collection 71. (Note that translation, scaling and rotation do not affect 
the radix of numbering.) More accurately, / is unimodal if the members 
of 7i are derived from scaled and translated collections of binary vectors. 
For radices that are higher than 2, the number of modes in / is identical 
to the radix. This aspect is incorporated in step 5. 

Proposition 3.1.9 ip. 136) shows that the discrete space 7’;‘(C, f) con- 
tains points of 3?" sampled from open balls of finite radius and centered 
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at the vertices of This property is also true of preservance in- 

put spaces whose radix is higher that 2. The function / constructed on 
the discrete set % essentially clusters the vectors around the modal val- 
ues as the value of / shrinks exponentially around each modal value. 
Step 6 sets the scale factor C to accommodate the widest cluster ob- 
served in %. In step 6, the ranking index is assigned a value which will 
allow the smallest interval between adjacent points in % (adjacency is 
in the sense of the norm) to be recreated in the preservance input space 

Figure 4.3 (p. 217) lists a procedure for determining a weight w 
that will be a preservance weight for the given collection of inputs 
in the training set. The basis for this procedure is that the discrete 
space t~'P"(C)Q) is a rotated version of t’P"(C,0). The constructions in 
the procedure are aimed at facilitating an evaluation of approximation 
only over vectors of equal norms. I have assumed that the ranking 
index r is adequate enough to allow every input vector in the training 
set to be approximated by a vector in the (centered) preservance input 
space with no error in the norm: a relaxation of this assumption will 
necessitate a revision of the construction of -tP(0 in step 5 to let the 
argument ^ of the set -».P(^) take on values in (connected) intervals of 
the set - 1 ‘V. 

Approximation ofvectors in the collection of training input points, 

by the vectors of the preservance input space is governed by an error 
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Given n, n = 1,2, , 71, a finite subset of and e > 0 

do 

1. Find r, ( and r appropriate to %. 

2. Construct the discrete sets and % as in steps 2 and 3, 
respectively, of the procedure in Figure 4 2. 

3. Partition the set 7^\ on the basis of the norm, to derive the 
sets 

= Il£ll = ^} V^e7; 

4. Set A: = 1. Let {0} be the current estimate of w- 

5. Construct the following sets: 7*“ (C, 0), 

Hj.'p = J X € I Ikll 

i e€t^^:(c.Q) 

and llsjl > Ikj-ill ,i = 2 , • |f’‘'Pr(C.Q)l 

and = {$ U € Ikll = ^} V? 6 

6 Evaluate e = |T[ (^) — 

«eT'(on ^k-p«) 

7. If e > € find a new guess based on and e, increment 
fc and repeat steps 5 and 6. 

8. Set w to the current estimate, 

done 


Figure 4.3: Procedure for determining w 
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criterion stating the mismatch (in terms of an appropriately chosen 
metric) of the collection of vectors of similar (idonticnl) norms, m, the 
error criterion measures the mismatch (or unsatisficability) of X (0 
with respect to for values of ^ that occur in f, as well as P, 

for all values of k, k = 1, 2, The distance between two discrete sets 
Ai and is defined as in the following: 

\Ai - A 2 \ = max min \x - y\ 

xeAi y£A2 

The procedures listed in this section give an insight into the issue of 
identification of preservance input spaces given a collection of training 
inputs. As indicated in an earlier section, the identification of a preser- 
vance input space given a collection of training inputs also, uniquel}^, 
identifies the class of preservance weights associated with the inputs 
in the training set. The choice of weights in the distinct nodes of the 
(first layer of a layered) neural signal processor, on an identification of a 
preservance input space, reduces to an enumeration within the class of 
preservance weights. Note that the identification of preservance input 
spaces is related only to the problem of representing the input signal 
space and is not influenced by the kind of mapping realized. 


4.4 Symbolic Computation with Neural Networks 

The representation of functions (processors) in neural networks wherein 
the first (input) layer incorporates preservance weights is essentially 
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a realization of mappings on discrete input spaces as discussed in the 
preceding sections. It is not difficult to visualize a situation, typically 
when the output takes on discrete values, wherein the isolated neurons 
and the layered networks involving such neurons compute functions 
between symbol spaces. 

Considerations of computability necessitate the symbol spaces to be 
discrete. The task of incorporating discrimination on symbol spaces, 
then, reduces to an identification of a suitable mapping between the 
discrete spaces employed by neural signal processors and the S3rm" 
bol spaces that support the function to be realized through the neural 
signal processor. In neural networks, inter-processor interconnection 
strengths are commonly identified with the mechanism of representing 
available {ie given) knowledge about the functional association between 
inputs and outputs. Emphasis is laid on the aspect that the represen- 
tation is not specific to any processor in the ensemble. In the tradition 
of symbolic computation, on the other hand, the notion of strengths of 
interconnection between processing nodes is not commonly used. 

Parameterization and the consequent representational methodol- 
ogy is always considered internal to processors with the implication 
that representation of available knowledge is processor specific. This 
necessitates a reinterpretation of isolated neurons, for the purposes 
of discussing the possibility of symbolic computation with neural sig- 
nal processors, as incorporating, in its definition, the weights of the 



220 


Chapter 4. Layered Neural Signal Processing 

channels incident on the processor and considering the representation 
incorporated by these weights ns being specific to that firocc'siuir: how- 
ever, this reinterpretation will be restricted only to this section 

Symbolic computation is studied primarily through the frameworks 
of Turing Machines, Cellular Automata etc. In a schema of intercon- 
nected processors, the basic processor is formalized as a (rewriting) 
rule specifying the value assigned to the output given a configuration 
of input values. Representation of functions in neural signal processors 
is through an assignment of appropriate values to weights and thresh- 
olds. Consequently, a study relating the values assigned to weights and 
thresholds to the structure of functions realized by neural signal proces- 
sors is essential in understanding the nature of symbolic computation 
offered by neural signal processors. 

Isolated neurons, in the context of Boolean functions, represent lin- 
early separable dichotomies and the representation of Boolean func- 
tions, in general, needs at least two layers of discrimination as estab- 
lished by Lippmann, 1987 and others. Symbolic computation achiev- 
able by neural signal processors, thereby, reduces to a study of the 
equivalent of linear separability in the context of symbol spaces. 

Symbolic spaces have only structural attributes. Thus, the discus- 
sion has to proceed through an algebraic characterization of the discrete 
inputspaceT7’"(C,22), ai € llwd 0, r = 1,2, ...; n = 1,2, 5R+ 

and 2? e Without any loss of generality, the subsequent discus- 
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sion will be based on the discrete input space (Ci^) with weights 
w £ rPn(<^)) for any appropriate a € 3t+, noting that to every weight 
u; € 5R” such that HwH # 0 there exists a preservance input space 
(Ci f) isomorphic to t'P” (C, :^) for all admissible values of r, n, C and 
1 ? and that {(,,■§_), for all admissible C and is a preservance input 
space for all weights in tPn{oi) as indicated in § 3.4 {p. 171). 

A function, by definition, induces a partition on the relevant input 
space and, hence, a characterization of the members of the partition is 
equivalent to a characterization of the function: this equivalence will be 
considered, where necessary, while relating the weight and threshold 
values to the algebraic structure of the functions realized by neural sig- 
nal processors. The partition induced on a set, say A, will be denoted by 
^{A), the distinct members being denoted by (A), i = 1, 2, . . |ip (>1)| : 

^{A) = {iPi(A),‘P2(A),...^j(A)},forsomej = 1,2,..., 

|!P(-4)| 

A = lJ'Pi(A);qj,(^)n<Pj(A) = 0Vi,j = l,2,...|q3(>l)|,t^j 

t=i 

Neural signal processors will in this discussion be considered as 
conforming to a processing model of two layered neural networks, ie, 

y{x) = cr(^Viy^{x) - 0) dl" , (4 5) 

i=l 

where, y^ (x) = o’ (w, x - 0^) and the other terms have the interpretation 
indicated earlier in this chapter. The discrete space B", n = 1,2, .. , 
together with a partial ordering relation, denoted by and binary 


11 A typical example is ’less than or equal to’. 
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operations V and A wliich have the interpretations of supremum and 
infimum, respectively, is termed a lattice: the operations V and A are 
assumed to be closed over 

It is not difficult to visualize that the lattice property of is un- 
affected by scaling and/or translation. However, the possibility of a 
necessity for redefining the operations V and A under a scale factor 
C € and a translation e 31" is supported by the notation and 
A(f , 3 ?) respectively. The following is easily established. 

T3t0J>0S3T30>f 4.4.1 The mem- 
bers in a partition of any lin- 
ear ly separable dichotomy on 
are expressed as 

A-. 

where i = 1,2; ki = 0,1,.. , 
ki ^ k 2 ^ 0, and each set 
is isomorphic to B and aligned 
’paralleV along any one of the 
co-ordinate axes. 

This statement is an immedi- 
ate consequence of the notion 
of Cartesian products and the geometric notion of convexity that each 
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member of the partition in a linearly separable dichotomy is subjected 
to. (See Chapter 2 for the geometric notion of linear separability.) 

Figure 4.4 illustrates some instances of partitions induced on 
cases (a) through (c) depict linearly separable dichotomies and case (d) 
depicts a dichotomy that is not linearly separable. (In this illustration, 
the binaries that form a one-dimensional space isomorphic to B are 
marked by ovals.) The illustration implies the following without the 
requirement of a proof 

TrROJ’OSrJTrJOX 4.4.2 For every dichotomy on B'\ n = 1,2, , that is 

not linearly separable infimum and I or supremum operations are not 
satisfied by every pair of points in at least one member of the partition. 

The above two statements motivate the following. 

TrK£oai£M 4.4.1 The following are equivalent 

1. A dichotomy on the discrete space 5^ is linearly separable. 

2. Every member of the partition induced by a dichotomy on f)'' is a 
semi-lattice. 


Before providing a proof for this statement, it is worthwhile to note 
that a set, say A, is a semi-lattice provided the following axioms are 
satisfied. 
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Al. The partial ordering relation is closed over A. 

A2 At least one of the binary operations V and A is closed over A 

Tatooy: The proof will be given in four stages based on the different 
kinds of partitions induced by linear separable functions as illustrated 
in Figure 4.4 (p. 222). Though the illustration shows partitions on 
the types of partitions and the arguments are sufficiently general to be 
applicable to partitions on B" for values of 7i other than 2. 

Case 1 Trivial Functions One member of the partition is null 

See (c) of Figure 4.4. In this case ^(B”) = {0,B”}. Both the members 

of the partition are lattices and the statement is true. 

Case 2 Non-trivial Functions' One member of the partition is a singleton 
See (b) of Figure 4.4. The partition in this case is expressed, in general, 
as ^(B”) = {{^q} ,B” \ { 20 } }> where xq € B". It is immediately appar- 
ent that the singleton set { 3 ^ } is a lattice, and thereby a semi-lattice, 
as all the axioms (of a lattice) are trivially satisfied. Considering the 
other member of the partition, the geometrical equivalent of the notion 
of linear separability implies that {ito} is either the maximal or the 
minimal element of B", ie, 

Vx€B”£o = V(x,x^) or£o = A(Xo,®). 

This assures that B" \ {^j,} is a semi-lattice. 
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Cases. Non-trivial Functions: Partition members are neither null nor singletons 
See (a) of Figure 4.4. Each member of the partition is expressed as a 
union of one-dimensional spaces, each isomorphic to B, re, 

kt 

.? = ! 

where, , j = 1, 2, . A:^, is isomorphic to B for some kt = 1,2, . , such 
that for every Bt^^ C there exists B^^^ C j 2 ji and Bx^^n B^^^ ^ 0. 
This aspect ensures that 

7^ 0 ar = a(x2,^3) , 

kl} = k2.2l3} = 

Note the structure of the members of the partition. While the operation 
V is defined over all pairs of points in one member, say this 

operation is not defined over all pairs of points in the complementary 
member ^^(^4). In a similar way the operation A is defined over all 
pairs of points in and is not defined over all pairs of points 

in ^x(-4). A simple justification for this situation lies in the fact that 
under the binary operations V and A, while the (global) supremum 
belongs to ^t{A), for some i = 1, 2, the global infimum belongs to the 
complementary member ^ 23 t(*^)* 

Case 4. Functions not linearly separable 

See (d) of Figure 4.4. From Proposition 4.4.2 (p. 223) it is clear that none 
of the members of the partition satisfies the axioms of a semi-lattice. 

□ 
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The discrete space ■P" (C, £) = 2 'Pr(C)^)>^ = 2, .;r = l,2, .. 

?)?+ and <0 € JR", being derived from B” through scaling and translation, 
Theorem 4.4.1 (p. 223) is easily extended to the case of dichotomies over 
P”(C,t)) as the following. 

T3C£oyi£M 4.4.2 Every member of the (binary) partition induced by a 
linear separable dichotomy on T)) iC, d), n = 1,2,. .;r = l,2, ; C G 5R+ 

and 6 6 5R”, is a semi-lattice. 

If the radix r is considered not to be restricted to 2, the following is 
easily established. 

Cr3££03i£M 4.4.3 All members in a (binary) partition induced by a 
linearly separable dichotomy on {(),•&), r = 1,2,.. ; n - 1,2,...; 
C € 5R+ and d e K", is a semi-lattice. 

It is worthwhile to note that for all the admissible values 

of r, n, C and forms a (truncated) lattice points (Erdos, Gruber & 
Hammer, 1989) and hence is a semi-lattice. Here again the partitions 
are of the same four kinds as discussed in the proof of Theorem 4.4.1. 

Tjc£orR£M 4.4.4 The following statements are equivalent 

1. A bipolar dichotomy S; (Cill) ^ is linearly separable. 

2. Every member of (p ( tP” (C, •0))is a semi-lattice. 
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Theorem 4.4.4 (p. 226), which captures the essence of the algebraic 
characterization of linear separable dichotomies on discrete spaces, is 
effectively a definition of linear separability for mappings defined on 
symbolic spaces. (Note that the notion of linear separability, as consid- 
ered presently in the literature, makes sense only for dichotomies.) For 
the sake of completeness, the following definition of linear separability 
in connection with symbolic spaces is being provided. 

4.4.1 A dichotomy over a symbol space, itself embedded 
in a (semi) lattice, is linearly separable if each component in the partition 
induced by the dichotomy on the symbol space is a sub semidattice. 


4.5 Summary 

Single layered neural signal processors, the simplest non-trivial inter- 
connected ensemble of neurons, in a continuation of the discussion of 
preservation of discrete spaces in one-dimensional spaces, have been 
shown to be adequate in representing all dichotomies and functions of 
interest on discrete spaces. While the adequacy of single layer neu- 
ral processing structures and the associated exponential dependence 
of the number of processing nodes on the input space dimensionality 
have long been known in literature, a discussion based on the notion 
of preservance weights and preservance input spaces, as carried out 
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in this chapter, has shed some light on the algebraic characteristics 
resulting from a restriction of the input space. 

The number of processing nodes required to achieve the desired 
representation, in general, shows an exponential dependence on the 
cardinality of the discrete space under consideration, and given the con- 
struction of discrete spaces the number of processing nodes 

becomes undesirably large as a function of the input space dimension- 
ality n as well as r. The cardinality of this discrete input space is 
unaffected by the specific choices of non-null weight G 3?”, C £ 5R+ 
and 1 ? G and it is to be expected that the number of processing nodes 
is invariant to these parameters. 

Learning (with generalization) of the weights, thresholds, bias id) 
and co-efficients of linear combination (v) has been shown to relate to 
the simple procedure of learning in isolated neurons; the additional 
requirement of specifying v is accomplished through a solution to a 
linear system of equations. Motivated by the urge to seek layered neural 
information processing structures capable of function representation 
with complexities less demanding than that of single layer neural signal 
processors, representation in multi layered neural signal processors has 
been investigated. While these network structures donot enlarge the 
space of functions represented relative to that seen in single layered 
neural signal processors, realization of given dichotomies (functions) is 
accommodated with fewer number of processing nodes. 
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Function realization, or processor representation in neural net- 
works, is interpreted to mean an identification of a suitable prcservance 
input space given examples of processing followed by an approximation 
of the desired processing functionality as functions on the preservance 
input space Identification of a preservance input space given a col- 
lection of training input also uniquely identifies the class of preser- 
vance weights. The issue of choosing weights in the distinct nodes of 
the first layer, then, reduces to an enumeration in the class of preser- 
vance weights. Representation of the input signal space is accomplished 
through an identification of a preservance input space given a training 
set. This implies that learning or representation in neural networks 
is not the same as an incorporation of values in a look-up table (or 
memory) as suggested by Aleksander (1983b) and Stonham (1983). 

Neural information processing having been discussed in the previ- 
ous and the present chapters on discrete spaces, a natural curiosity of 
seeking the possibility of realizing symbolic computation through lay- 
ered neural information processing structures has motivated a study 
of neural signal processors in terms of the partitions induced on the 
(discrete) input space. An inquiry into the algebraic properties of the 
partitions leads to the conclusion that linear separability of a given 
dichotomy is equivalent to the members of the partitions induced by 
the given dichotomy being semi-lattices. This alternative definition 
of linear separability allows the notion of isolated neurons and, con- 
sequently, neural information processing structures to be defined on 
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symbol spaces, however, in order that the notion of learning and, in 
particular, generalization are valid on symbol spaces, an equivalent of 
linear combination, as an operation on s 5 mibols, is imperative. 

The algebraic characterization of linear separability, together with 
the notion of neural signal processing architectures minimal to a given 
dichotomy prompt an inquiry, not unnaturally, into the nature of the 
representational paradigm operative in neural information processing. 
In particular, it is of interest to be able to identify an axiomatic frame- 
work that would provide an insight into the kinds of processing that 
can be expected of neural signal processors. Noting that representation 
of functions in neural signal processing structures involving a degree 
of layering larger than unity entail a reduction in the requirement of 
processing nodes, the character of representation, in the sense of the 
nature and degree of approximation, of layered neural signal processors 
needs to be investigated. Chapters 5 and 6 will be devoted to these and 
related inquiries. 



Chapter 5 


7\)e.ui^al Sl0»aal Processirv0 
1^epreseiafafiok\al issues 


Marco Polo describes a bridge, stone by stone. 

'But which is the stone that supports the bridge?' Kublai Khan 
asks. 

'The bridge is not supported by one stone or another,' Marco an- 
swers, 'but by the line of the arch that they form.' 

Kublai Khan remains silent, reflecting. Then he adds : 'Why do you 
speak of the stones? It is only the arch that matters to me.' 

Polo answers: 'Without stones there is no arch.' 
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— Italo Calvino 
in Invisible Cities, 
Picador, London, 1979. 
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Signal processor realization, especially of the nonlinear kind, through 
artificial neural networks is centered in the model free classification/ 
estimation aspect of the paradigm which essentially simplifies the op- 
erational nature and allows for a representation of input-output de- 
pendencies to be established through an appropriate number of non- 
exhaustive examples. Processor representation has been looked upon 
in neural networks, as in signal processing, as a situation of approx- 
imating a function, however, the deviation from conventional signal 
processing approaches lies in that the function approximated is ex- 
pected to have connotations of perceptual characteristics, specifically 
categorization/estimation, and the function is commonly described, not 
analytically, but in terms of a list, generally not exhaustive, of typical 
correspondences between inputs and outputs' the study of representa- 
tion in Chapters 3 and 4 are based on these aspects. 

Function approximation in neural networks, similar to that in con- 
ventional signal processing, is viewed as a search, for the closest mem- 
ber (in the sense of an appropriate measure of satisficability^ ), in the 
linear span of an appropriately chosen set of ’basis functions,’ thereby 
suggesting close relationships between function realization in neural 
networks and integral transforms. The ’basis functions,’ though depen- 
dent on the specific class of functions being approximated (as decided 
by the particular repertoire of input-output associations provided in the 
training set), are at a conceptual level, particularly in layered neural 


'Refer § 2 1 for the notion of satisficability employed in search. 
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networks, synthesized from a family of activation functions which is, in 
general, shared by a large class of training data: the family of activation 
functions is independent of the family of functions to be approximated. 

In neural signal processing, the specific choice of ’basis functions’ 
is determined from the available knowledge of the desired map by a 
procedure of parameter location, also known as learning, or training. 
Clustering, in the abstract sense of capturing the relevant features 
needed for categorization, estimation, or function approximation, is the 
essence of realizing desired maps whereby the ’basis functions’ are con- 
strained to be nonlinear and to have localized influence, not necessarily 
in the sense of having compact supports. The activation functions (cr) in 
the processing nodes, too, are, thereby, required to exhibit a nonlinear 
functional nature: the kind (or type) of nonlinearity is crucial to the 
discriminatory power of a neural signal processor. 

Monotonicity in the activation functions, components in the synthe- 
sis of basis functions (of local influence), is not considered favorable 
in function approximation in view of the complexity in generating lo- 
cal functions as a linear combination of non-local functions, essentially 
an algorithmic (convergence) consideration. However, these non-local 
functions, sigmoidal functions being a typical example, are simple, ade- 
quate for function approximation and considered to be biologically plau- 
sible formalisms of the discriminatory requirement in decision making. 
The activation functions, though monotonic, have local variations. 
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While the representation potential and characterization of the exis- 
tence of representation, in neural signal processors with a single layer 
of decision making has been studied in sufficient detail, not enough at- 
tention has been riveted on the capability of multi-layered neural signal 
processors, not withstanding the algorithmic inadequacies of such pro- 
cessor realization schemes. Partly hindered, in analytical treatment, 
by the intervening nonlinearities between (adjacent) layers of linear 
filtering, the role of multiple layers of decision making, particularly 
with bounded number of processing nodes, on function approximation 
capabilities is not known in desired detail. An immediate consequence 
of this inadequacy is that no satisfactory criterion is yet available for 
deciding the number of layers^ to be used in a neural signal processor, 
a requirement of processor design in the neural processing paradigm 

I will initiate the discussion of representational issues in neural 
signal processing by trying to provide an understanding of the under- 
lying (operational) paradigm in artificial neural networks. This ex- 
ercise is important as neural networks, in the literature, have been 
compared at a functional level with approaches based on Gabor func- 
tions (Daugmann, 1988), ridge functions (Ya Lin & Pinkus, 1993), 
wavelets (Zhang & Benveniste, 1991; Pati & Krishnaprasad, 1993), 

^While this statement is applicable, largely to neural signal processors realized 
through activation functions of the sigmoidal (including hard limiter) type, and it is 
unlikely that multiple layers of decision making would be needed, for satisfactory func- 
tion approximation, with radial (or elliptical) basis functions, knowledge of the role of 
multiple layers of decision making would be useful, in general, in explorations of pro- 
cessor types capable of capturing our (current) understanding of perceptual abilities and 
requirements. 
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and matched filtering (Grant & Sage, 1986), and have been applied in 
many signal processing situations, eg, image data compression, rout- 
ing and congestion control of telecom traffic, decoding in code-division 
multiple access (Aazhang, Paris & Orsak, 1992), etc. Moreover, the 
superiority, in modeling of neural networks to conventional approaches 
has often been reported in the literature. 

Neural signal processors are shown in this chapter to provides state- 
ments of aggregations of decisions taken on features extracted from 
patterns presented to the network: the features are related to pat- 
terns through integral transforms and the averaging process allows 
concepts to be aggregated from decisions on relevant features. This 
aspect of the underljdng paradigm of artificial neural networks enables 
a discovery of similarities and dissimilarities with conventional signal 
processing approaches and assures a possibility of a complete neural 
basis for realization of nearly all aspects of a signal processor, the latter 
assurance can, however, be given only when a multi-layered scheme 
is adopted for the approximation task at hand. Cognitive scientists 
and neuro-anatomists could find this perspective of neural networks 
incorporating feature extraction, decisions on feature spaces, and ag- 
gregation of decisions to form concepts, useful in exploring the specific 
kinds of signal processing that are carried out in the nervous system. 

A study of the representational potential offered by multi-layered 
neural signal processors being the main topic of this chapter, I begin 
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with the definition of neural signal processors in § 5 1 and establish 
that these neural signal processors are capable of representing contin- 
uous functions on finite dimensional spaces with arbitrary accuracy: 
neural signal processors, though introduced in Chapter 4, are defined 
once again to enable an investigation into the kinds of signal process- 
ing situations represented. The functional character of neural signal 
processors are studied in § 5.2 (p. 249) and in this section the axioms of 
neural signal processing are formulated. 

In § 5.3 (p. 276) I formulate the representational paradigm operative 
in neural signal processors as being a nonlinear association between 
integral transforms: this paradigm introduces the metaphor of integral 
transform kernels being the internal representation of the knowledge 
available (or given) through a training set. § 5.4 (p. 284) is a study of 
representation in neural signal processors, in particular, the character 
of representation, wherein I suggest an interpretation to a function 
representation theorem of Kolmogorov in the context of neural signal 
processing. I also establish that as the depth of layering increases, the 
degree of smoothness with which a multi-variate function is realized 
increases: this in turn suggests an explaination for the relatively higher 
representational complexity of single layer neural signal processors as 
compared to the multi-layered variety. 
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5.1 Neural Signal Processors: Definition and 
representational potential 


In § 2.2, the formal models for neurons and neural networks have been 
introduced and in Chapter 4 the functional model of layered neural sig- 
nal processors has been considered. These models have been suggested 
to address issues of information storage {ie, representation) and pro- 
cessing that dominate any discussion of automated intelligence viewed 
in the perspective of information processing. Neural signal processors 
are formulated as linear combinations of neural responses to overcome 
the inevitable restriction of the output in the formal model of neurons 
(purporting to capture the firing frequency of the action potential of 
biological neurons) being limited to a specific (proper) subset of 5R, the 
real number field, typical examples being closed intervals like [0, 1] or 
[-1,1] in the case of continuous real valued neurons, and sets like {0, 1} 
or {-1, 1} with binary (real) valued neurons. 

Multi-layered neural networks form the basis of neural signal pro- 
cessors and the possibility of representing functions in feed-forward 
multi-layered neural signal processors with fewer processing nodes 
than necessary in the case of single layer neural signal processors en- 
courages a consideration of more general schemes of neural processing. 
Noting that the model of processing in networks of neurons suggested 
by Equation 2.16 (p. 67) is a unified statement of neural network archi- 
tectures, the following is suggested. 
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D£y3Na730>r 5.1.1 The class of scalar neural signal processors of 
type-k, A: = 1, 2, . , denoted by is given by the functional form 
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where 7], ij, ^ t, a, a, b, s, e, and r have the same connotations^ as 

indicated in § 2.2. The processor, and its functionality, are denoted by 

with appropriate indices and dependencies.^ 

®Note that thresholding is incorporated in the abstract translation function 6 
^Nole that the neural computational process defined as a transformation on an mo 
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(Note that this definition is the same as the functional form in Equa- 
tion 2.16 {p. 67) except for the additional statement expressing the 
mechanism of deriving the required scalar output from the array of 
decisions and subsumes the definition of neural signal processors 
introduced in Chapter 4.) 

The above definition can be effortlessly extended to the class of 
vector neural signal processors by the assignment 

( 5 . 2 ) 

(where, the elements of is defined as in Equation 5.1), though 

in signal processing, often, the processor is scalar valued, and, unless 
otherwise mentioned, neural signal processors will be considered to be 
scalar valued in the sequel. In passing, it is worthwhile to note that 
is a function^ space described by 

9T = Olnicmx, (5.3) 

where, (= withm^^'^ = l7?io,mu. . ,7nk,7nk+if) 

denotes the family of functions realized by a type-A: neural signal pro- 
cessor with the number of nodes in the corresponding layers^ given by 
the elements of the vector 

dimensional pattern space, and indexed in a (directed) one-dimensional space signified 
to have the interpretations of time, can easily be extended as a neural computational 
field wherein the indexing space is of dimensionality greater than one However, specific 
relations of (partial) ordering need to be imposed. 

^More precisely, as indicated later, ^9^ is an operator space 

®mo denotes the number of inputs, ie, the number of elements in x, and mjt-i-i = 1 in 
the case of scalar neural signal processors 
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Figure 5.1 illustrates the processing hierarchy in a neural signal 
processor of type-fc. Neural signal processors, of all types, have a tax- 
onomy, and architectural peculiarities, similar to neural networks’’ in 
view of the above definition. Thus, neural signal processors can be con- 
tinuous or discrete (generally binary) valued, feed-forward, recurrent 
or competitive, with additive or multiplicative (shunting) dynamics in 
the individual processing nodes. In the above definition, the reasons for 
associating the number of decision-making layers, le, k, with the type 
number of neural signal processors and the necessity for the notion of 
types will become clear in the ensuing discussion. 

Present research on neural network based signal processing and 
function approximation has focused extensively on type-1 neural sig- 
nal processors (ie, ‘91). According to a theorem due to Cybenko 
(1989), t 3 ^e-l neural signal processors with sigmoidal activation func- 
tions realize continuous functions with arbitrary accuracy. This re- 
sult has also been reported by several others in the literature (see eg, 
Vepsalainen, 1991; Mhaskar, 1993; Ya Lin & Pinkus, 1993). We 
note, in passing, that an isolated neuron is expressible as a type-ii: neu- 
ral signal processor for all values of fc (A: = 1, 2, . . .), ie, isolated neurons 
belong to the space of processors realized from them. 

On being introduced to the definition of neural signal processors, 
one of the first questions that crops up concerns the potential for rep- 


’^See Chapter 2 for the taxonomical and architectural details in neural networks. 
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resentation. Noting that such questions have, indeed, been fruitfully 
answered in relation to neural networks, I will now rephrase existing 
statements (applicable to type-1 processors) in the notation adopted in 
this thesis. Some minor generalizations of the statements have also 
been attempted along with the rephrasing. Also note that ^01 denotes 
the space of functions realized by type-fc neural signal processors. For 
convenience of analysis, I denote 

I X e , for all = 1,2, . . ; f € 3?o,+ , 

to identify the class of functions realized by a neural signal processor, 
as a consequence of possible evolution, at time t, *‘)l(t) C *01: non- 
evolutionary processors are characterized by *01(t) being invariant in t, 
and for this class of processors, *01(t) = *01, for all t € In a similar 
fashion *01„(»,) (t) will be used to denote the class of evolutionary neural 
functions realized by processors whose nodes are as specified in m(*). 

Sigmoidal functions, the more popular and traditional choice of ac- 
tivation functions, establish one-one correspondences between 5R and 
[C- , C+] C 3?, the latter being the range space of (isolated) neurons, and 
it is of interest to note the following. 


IP3i03>0S3X30>f 6.1.1 For every family of continuous functions /• SR” 

3?, the family being indexed on 3io,+ (responses of members of the family 
are denoted by f(x,t), £ 6 St", t e such that the mapping f, at 
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every point in 3fo,+^ is surjective (on 

J k(/(x, = 0 = p = 0, for any t E 

where, p is a finite measure on and a is a sigmoidal function. 

y:ROO^: This statement reduces to the (9?) norm of a noting that the 
support of <j, for all functions /( ,t), at every t E is all of Since 
the hyperbolic tangent function from which the sigmoidal function is 
derived is not an integrable function, ic, ta7ih{ ) ^ jb^(5P), the desired 
result follows immediately. 

□ 

Noting that type-1 non-evolutionary neural signal processors de- 
fined on f the unit hypercube, with [C-, C+] = (0^ 1] dense in the 
space of continuous functions as established by Cybenko (1989), the 
following characterization of type-fc neural signal processors, implied 
by Proposition 5.1.1, is necessary. 

5.1.1 If the activation function a is sigmoidal in the weaker 

sense of 

such that a E ie, cr is continuous, and 

{ C_ as X — oo, 

as X 4-00 
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(for some, appropriately chosen, C- 1 C+ ^ ^ such that C- < C+A then 
for all k = 1 , 2 ,..., and for all t € 9io,+, dense, with respect 
to any finite measure, in the space of continuous n-variate real 

functions. 

Poiootr: This arguments in this proof have been considerably influenced 
by the proof provided in op cit for the case that reduces to the earlier 
mentioned characterization of type-1 neural signal processors. Observe 
that 

>‘m(t) = I 3?" X {<} -+ SR I 

mk I 

?<*> & 0 = E (,;*> S, .) - «'*«> I 

C (7(51?") , X € 3?”, for all t € SR+, and A: = 1,2, , 

where, (with appropriate subscripting indices) is given by 

Equation 5.1 (p. 238). *'Tl((), fc = 1,2, . , t € SRo,+, is a linear sub- 
space of the linear space (7(5R"), and incorporates several cases, ie, that 
of deterministic processors in steady-state, processors with additive, or 
shunting, dynamics with possible stochastic interpretation to the acti- 
vation functions, etc. The claim is that the closure *9i(t), of *91((), is 
all of (7(3?"), fc = 1, 2, . . . , ( € SRo,+. As the subsequent argument is in- 
dependent of k, the t)q)e number of neural signal processors, and t, the 
temporal index, quantifications on k and t will not be indicated unless 
absolutely necessary. 
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Let, if possible, a closed proper subspace of C(W*), be different 

from By the Hahn-Banach theorem (c/, Kreyszig, 1978; Rudin, 

1986) for a bounded linear functional, say /, defined on and 

a sub linear functional on satisfying |/(0^^^)l < 

V ^ there exists a bounded linear extension, say /, from 

to (7(3?") such that and |/(0^*^)| < 

V € C(9?'‘). In particular, if we choose / to be 0 on all of *91 and *01, 
then we can expect an extension / ^ 0 on C(3t"). 

By the Riesz representation theorem (op cit) this bomided linear 
functional, f is expressed as the (functional) inner product 

f{h) = j h(x,t)dp{x) ,Vt e3io^+, 

Si" 

for some (signed) measure p of bounded variation on St” and for all 
h e C(3?”). In view of the nature of the members of *01(t), and as the 
coefficients of the linear combination are not all zero, it is essential that 

J WiVj%is.,t))\dp(x,t) =0, Vt G SRo,+, 

for all instances of € C'(3?”). However, when sigmoidal activation 
functions are used, as already established in Proposition 5.1.1 (p. 242), 
this condition implies that p = 0, ie, the functional / = 0 on C(5ft"), a 
situation contradicting the one we are interested in, thereby establish- 
ing the denseness of *91(t) in C(3t“)> for all t € 9to,+ . 

□ 
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CoxoccAOiy TO Crw£ 03 i£M 5.1.1 With sigmoidal activation functions, 
and arbitrary compact subsets C C 9i", V A: = 1, 2 , . . and for all 

t e 3?o,+. is dense in C{C) with respect to any finite measure onC C 3?". 

The proof of the above statement is on the same lines as that for Theo- 
rem 5.1.1 (p. 243) except that Riesz representation theorem on compact 
sets (Kreyszig, 1978, p. 227) is invoked to establish the contradiction 
and, hence, is not being detailed. Note that in this discussion, 5R" refers 
to an embedding of the (non-null) observation space (which is the input 
space X augmented -through a Cartesian product-with the fraction 
of the output space fed back through lateral interactions and recurrent 
connections) and !lio,+ allows for a specification of the time index t at 
which all computation, via the neural processor, is considered. 

It is important to note that the above theorem, characterizing the 
existence of representation in (multi-layered) neural signal processors, 
requires the activation function cr to have the specific property that 

j fe) ^))l dp{x) f:- 0 for all finite measures 5io.+> 

(5.4) 

ie, the norm of a evaluated over all instances of for each i, 
f = 1, 2, . . . , A:, be non-vanishing, which indicates that the density prop- 
erty is not a feature of the sigmoidal fimction alone, and that we can 
expect several other types of activation functions resulting in a similar 
representation potential. However, as indicated by Leshno, Ya Lin, et 
al (1994), denseness of *'9l(<) is assured if and only if the activation 
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functions a are not algebraic polynomials (almost everywhere). The 
property indicated in Equation 5.4 is not a tautology noting that a mea- 
surable function, say h, allows us to defime a measure, say 0, in terms 
of another, say n: 

j d(t){x) = j g{x) h{x) d^{x ) , e C(A") , X C 
x a: 

and in view of this relation (cf, Rudin, 1986) J;^g{x) d(j>{x) = 0 if func- 
tions g and (p (as decided by h and /i) have mutually disjoint supports. 
Cybenko (1989) terms functions satisfying the property indicated in 
Equation 5.4 as being discriminatory. 

Denseness in representation provided by neural signal processors 
is not restricted to the space of continuous functions nor to sigmoidal 
activation functions as indicated below, 

"Poioxosoxjoyf 5A.2 (Refer Hornik's theorems on universal approxi- 
mation in neural networks reproduced in Leshno, Ya Lin, et al, 1994.) 

1. If the activation function a is bounded and different from a con- 
stant, for any finite measure p, for all A: = 1 , 2 ,..., and for 

all t e dense in 1 < P < oo. 

2. If the activation function a is continuous, bounded and non-constant, 

then for arbitrary compact subsets C C ^91(t), for all k = 
1,2,..., and for all t € dense in C{C) with respect to 

uniform distance. 
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The above two statements in Proposition 5.1.2 reaffirm the sense 
of representation indicated in Theorem 5.1.1 ip. 243) with sigmoidal 
activation functions. Before closing this preliminary discussion on rep- 
resentational capacity, I draw attention to the fact that some of the 
investigations pertaining to ’three-layer-sufficiency’ for the representa- 
tion of continuous functions relate to statements of type-2 neural signal 
processors. Of particular interest are works based on a theorem, related 
to the representation of multivariate functions, due to Kolmogorov 
(1957b) (refined later by Sprecher, 1965) and Arnold (1957) (see also 
Hecht-Nielsen, 1987c; Girosi & Poggio, 1991; Kurkova, 1992; La- 
gunas, Perez-Neira, et al, 1993 and Kovacec & Ribeiro, 1993). 

In these investigations the networks suggested have the form 

2n n 

vis.) = ^X9(X^Trpf(^p)) for all X € f”. 

7=0 p=l 

While this form, on inspection, is immediately seen to have a similarity 
with type-2 neural signal processors, differences exist. One difference 
is in the fact that it is uncommon, in neural signal processors, to subject 
the inputs to decision-making (ie, functions tt) without any preprocess- 
ing-this has, incidentally, led some researchers to comment that Kol- 
mogorov’s theorem is not relevant to neural networks. In § 5.4 (p. 284) 
I will expand on the representational capacity of neural signal proces- 
sors and, in this effort, will provide an interpretation to Kolmogorov’s 
theorem which will be suited to an appreciation of signal processing 
with neural networks. 
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5.2 Functional Nature of Neural Signal Processors 

Neural signal processors of type-fc in the family (t) establish 

functions, for all t 6 3?o,+, from to 3^ C 3?™^, where too = n 
is the number of incident, or observed, channels. The functions are, 
in general, indexed by t e 5Ro,+, a variable with the connotations of 
time: under steady state considerations, this index is dropped for nota- 
tional convenience. In the sequel, the nature of functions established 
by neural signal processors and their characterization will be in focus. 
Extension of the ensuing statements to situations wherein neural sig- 
nal processing includes topological spaces, ie, when the input pattern 
space X (and, consequently, the observation space) is a manifold em- 
bedded in the topological vector space 9?”*® and the output pattern space 
3^ is a manifold embedded in the topological vector space 5R”‘*-+*, though 
possible, has not been included in the scope of this work. 

!P:R03’0saa'30>r 5.2.1 Each processor in for all k, k = 1,2,. . , 
and for all t € SRo,+ defines an operator from !R”‘® to D 3^. 

Tkoos-: Recall the definition of neural signal processors: n type-1 neu- 
ral signal processor consists of three operational stages. In the first 
stage, ie, measurements, = 1,2, ...,toi, are evaluated from 

the presented pattern x and, if relevant, the past history of processor 
operation. The second stage enables discrimination on the measure- 
ments, through nonlinear evaluation of > lo get corresponding val- 
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ues (discriminates) = 1,2,. .,mi. Finally, the discriminates 

are linearly combined to provide a response which has the inter- 
pretation of concept, category, estimate, etc, depending on the context 
in which neural signal processors are being discussed. 

It is quite easy to see that the measurement functions = 

1,2, . . , mi , are, each, by definition, indexed collection of operators from 
to (indexing being over 3to,+) by virtue of their being realized 
through an accumulation (in the discrete sense as summation and in 
the continuous sense as integration) of inner products and results of 
operators ((>jji\) acting on modulated by the result of operators 
) acting on . Similarly, the discrimination functions = 

1,2, ...,mi, are, each, operators from SR to SR, and the aggregation of 
through = 1,2,.. , mi, is a linear operator from SR"*' to 

SR. The function of a type-1 neural signal processor, at every time 
instant in SRo,+, being realized through an appropriate composition of 
measurement, discrimination and aggregation operators, is, obviously, 
an operator from SR"*° to SR. A neural signal processor of type-1, with 
vector valued response (the output vector g having ma) is, on the same 
lines, an operator from SR"*° to SR”*'', for all t 6 5to.+ - 

Given an appropriate fc-layered stacking of measurement and dis- 
crimination stages, ultimately with a vector valued response having 
elements, we observe, again, from the definition, that a type-ifc -f 1 
neural signal processor is realized by first subjecting the mj^rr out- 
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puts (of the given Mayered ensemble) to the operators = 

1,2, • » followed by the linear operator aggregating from 

This organization of processing imme- 
diately suggests that, the function of a type-A: neural signal processor 
mapping from to s)fy"rTT^ at every t e J)?o, f , in view of being derived 
as a composition of operators, is, trivially, an operator. 

□ 

Continuity of the operator is assured only if, in all the participating 
nodes, the activation function (a) is continuous. Boundedness of the 
operator induced by neural signal processors follows, trivially, from 
boundedness of the activation functions employed. 

TcR 0 :;> 0 S:)Ta 0 N 5.2.2 (Type number additivity,) Every vector-valued neu- 
ral signal processor oftype-k, (x, t) e all k, k = 2,3, . , 

has a decomposition in terms of neural signal processors of lower type 
numbers 

for some ki = 1,2, , . ,k — 1, for all x G 3?^^° and t € 5Ro,4-* 

Toioosr: Note that in a neural signal processor of, say type-fci, ki = 
1 , 2 , . . . , fc - 1 , the aggregation of outputs from discriminates , 
= 1,2,..., 771 *:, , is expressible as the linear transformation 

0^'=’ (x, t) = (x, t) - 
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where, 

W<‘«> = 

and 

Consequently, a processor described as 

is identical, in functionality, with a processor in ^91 noting that the 
outer processor (of type-fc - ki) evaluates® Ij^) j(^) = 

1,2, . . , mo = as components of the first layer of measurements 

These evaluations do not alter the linearity of inner products and ah 
low for a simple concatenation of the constituent layers of measure- 
discriminate-aggregate stages. 

□ 

This decomposition, a simple consequence of the definition, is, in 
general, non-unique and allows the following to be anticipated. 

TrR05>osnT:}07sf 5.2.3 Every neural signal processor of type-k, for all 
k, k = 1 , 2 ,..., has a decomposition involving k type-1 neural signal 
processors, ie, 

such that o o » • 

®For reasons of notatioflal clarity, entities relevant to the processor in are 

shown with the accent “ 
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This statement follows directly from Proposition 5.2.2 by tak- 
ing fci = 1 and recursing on the processors of type larger than 1. 

□ 


(Note that dependence of the response of the neural signal processors 
on (x, t) has not been explicitly shown.) 

While the above statements are quite obvious, they do highlight that 
at an architectural level, neural signal processors of all types are com- 
posed of layers of in star-out star neuronal fields^ and provide a simple, 
though inadequate, justification for the representational potential of 
type-fc neural signal processors, noting that, in view of function compo- 
sition, the denseness of in (also [/a]) follows from 

the density theorems for It is to be noted that all neural signal 

processors of type-fc, A: = 2, 3, . . , can be viewed as a type-A: — ki proces- 
sor operating on the result of a type-fci neural signal processor. This 
suggests that preprocessing, if any, of the presented signals (patterns) 
can be sought to be represented in a neural basis. 

An organizational feature common to neural signal processors, not- 
ing the layers of in star-out star neuronal fields, is that the desired pro- 
cessing is achieved through stages of measure-discriminate-aggregate 
layers. Each layer, essentially a processor in realizes its function 

through measurement, discrimination and aggregation stages. In the 

® In Chapter 2 the notions of in star and out star neurons and neuronal fields have been 
introduced. 
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common presentations, measurement and aggregation are captured as 
linear processors and discrimination is incorporated through the non- 
linear activation functions: the resultant, from the point of view of 
classification, categorization and recognition, should, inevitably, be a 
nonlinear operation so that the ensuing processor is non-trivial. These 
observations motivate the following. 

Axdo^ 5 . 2.1 Axiom of Organization. 

A neural signal processor is composed of (layers of) three operational 
stages: measurement, discrimination and aggregation in that order. 
Preprocessing, if any, (preceding, or incorporated in, the measurement) 
is sought to be represented in a neural basis. Measurements are effected 
on an observation space constructed as the Cartesian product of the 
input space and a relevant subspace of a union of the space of responses 
of the distinct layers. 

Though the above axiom suggests the necessity of three operational 
stages, these need not be distinct. Processing schemes wherein the non- 
linear nature of the operation results from measurements rather than 
discrimination are known in the literature {eg, Davidson & Hum- 
mer, 1993). Recall that nonlinearity in the processor functionality is 
considered essential in concept realization through decision making. In 
view of the interpretation that outputs of neural signal processors are 
concepts the following statement is a consequence of Proposition 5.2.2 
(p. 251), 
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yato:POS5a':J07sf 5.2.4 A given situation of concept realization (ie, func^ 
tion mapping) can be achieved with different choices of 'intermediate 
concepts,' the latter also being known as 'sub-concepts/ 

'J^unDO's: From Proposition 5.2.2, we note that a concatenation of type- 
neural signal processors with type-A: - ki processors results in type- 
k neural signal processors, the resulting structure being equivalent 
to a t 3 pe-A: - ki processor acting on the response ^ type- 

ki processor. If responses ^) (= are 

now interpreted as concepts, we note that in view of the evaluation 
j(i) _ 1,2, ..,,7710 = in the components 

of the measurements, xmiqueness of the final response is decided by 
the uniqueness of the product vector = 1, 2, . . , 7770 = 

niki+i, rather than that of either or 

□ 

This situation is to be contrasted with the multiplicity of representa- 
tions shown in Chapter 4: while in § 4.1 non-uniqueness of the solutions 
for terms like ie, product of weights, was in focus, Proposi- 

tion 5.2.4 refers to the non-uniqueness in realizing the weights through 
products. A consequence of the above statement, in particular with a 
signal processing perspective, is that computational cognitive science 
can, at best, discuss operationally sufficient models and not logically 
necessary models. The possibility of allowing given situations of con- 
cept representation using different intermediate concepts is reflected, 
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at an organizational level, in terms of the feed-through interconnection 
strengths s being composed by aggregation 'weights of one layer and 
measurement weights of the succeeding layer, leading to an alternative 
definition for neural signal processors (with unit lateral delays r) 

5.2.1 Neural signal processors oftype-k, are given 

by the functional form 


• (^) 

{x,t) 




+ ffijco i) + S.% ^ - 1)] . 

{5.5a) 



(5.5b) 



(5.5c) 


for all = 1,2, . . , mi, for some mi = 1,2,. 

• > 


for all — 1,2, /hr so/ne = 1,2,. ; 

^ = 1,2, ,A:, 

where, y, t), t, a, a, b, s, e and 6 have the same interpretations as 
in Equation 5.1 (p. 238) and = gi, Wt e Weights w and 

€ are associated with measurement and v with (concept) aggregation; 

and C for all = 1 , 2 ,..., me and 
for all = 1,2,,.., m^, £=1,2,. .,k, where, me denotes the number 
of decision units participating in layer £ and denotes the number of 

(sub) concepts realized in layer £, i = 1,2, . , with mo = £ = 

1, 2, denotes the biases necessary in realizing the relevant numerical 
assignments corresponding to the (sub) concepts, 6^^^ e 
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Characterization of neural signal processors is best understood in 
terms of the functions (operators) induced between the manifolds X and 
3^. T 3 ^e -1 (deterministic, non-evolutionary) processors (^Ot) with hard- 
limiting activation functions have been shown to (see egy Lippmann, 
1987) effect a separation of the input space through (hyper) planes. 
Networks of such processors have also been shown to partition the 
input space in terms of convex and/or non-convex regions depending 
on the number of layers in the network- I will now address the issue 
of processor characterization in a slightly general framework, basically 
to arrive at a specification of the operator induced (from X to 3^) by a 
t 3 ^e-/: neural signal processor. 

Cr!K£oraeM 5.2.1 Measurements in neural signal processors of type- 

k, ie, members of^^{t), A; = 1, 2, . partition the input manifold X, of 
dimensionality n, in terms of manifolds of dimension no more than 
n — 1; in situations wherein the activation functions have stretches of 
constancy, this dimension is no less than n — 2.^® 

, for all = 1,2,... ini, ^ = 1, 2, ... A:, having its continuity 

inherited from that of the abstract amplification function a, the abstract 

translation function b and the activation function a, the continuity of 

depends on a and b corresponding to the processing node in 

layer £ and a, b and a of all processing nodes in layers 1 to f ~ l': in 

^°This statement suggests the interesting possibility of using, as candidates of acti- 
vation, functions whose preimages, in the domain, are of fractional dimension In this 
thesis, however, this issue has not been taken up 
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the popular neural models, ie, models of additive as well as shunting 
dynamics, continuity of a and b is assured in all nodes and the activation 
functions are allowed to have up to countable discontinuities. Thus the 
measurements, will be considered as continuous functions over 
these functions are indexed over !Ro,-f (through the variable t) 

Noting that for the admissi- 
ble values of j and £, refers to time- 
indexed spatial measurements, it is 
simple to see that these are, indeed, 
manifolds^ ^ (ie, surfaces) over the in- 
put space. Continuity of in the tem- 
poral index variable (t) reflects the 
nature of evolution of the measure- 
ments over the input space. It is now 
simple to see that the measurements 
7] at all layers, nodes and points t G 

induce manifolds in the input space and the collection, in the sense of 
a set union, of such manifolds over the possible assignments to 7/^^^^) (•, ^) 
describes the entirety of the input space for each admissible value of 
i, and t: the individual manifolds for any given values of £, 
and t are themselves disjoint. The claim regarding the dimension of 
the manifolds induced in the input space by measurements rj will be 
established using mathematical induction on layering (type number). 



A manifald is a geometrical structure that is homeomorphic to 
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Verification 

When we consider isolated neurons, ie, trivial neural signal proces- 
sors of type-fc for all A: = 1,2,..., the claim is immediately apparent 
as the measurements 77 induce a partitioning of an n-dimensional in- 
put space in terms of (hyper) planes, each of dimension n - 1 and of 
zero measure with respect^ ^ to the Lesbegue measure of n dimensions. 
The activation function a alters only the manner of appreciating the 
manifolds induced on the input space by measurements and not the 
manifolds per se: the manifolds induced on the input space as a result 
of activation functions, on members in the manifolds of measurement, 
will be termed manifolds induced by decisions. In the case of activa- 
tion functions that are continuous the dimension (and measure) of the 
manifold is not altered. Activation functions that have stretches of con- 
stancy (eg, hard-limiting functions) regroup the manifolds induced by 
measurement to ensure that the resulting manifolds being continuous 
unions^^ of n - 1 -dimensional mutually disjoint manifolds are of finite 
(and non-zero) measure and a dimension no less than n — 2. It is now 
simple to see that the same arguments hold to the measurements of 
type -1 neural signal processors as such processors are, operationally. 


the rest of this proof the measure will always be taken to be Lesbegue measure on 
n -dimensions This aspect will not be indicated explicitly. 

^^Note that while the manifolds induced by measurements need only n — 1 distinct 
basis vectors for a complete description, the manifolds induced by decisions, when the 
activation functions have regions of constancy, need all of n distinct basis vectors, however, 
the entire scope of only n ~ 1 of these basis vectors is used for description: variation along 
the remaining basis vector is restricted to a (compact) proper subset of the total extent of 
variation. 
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no different from being an array of neurons, each being identical in 
functionality to an isolated neuron, the outputs of which are linearly 
combined in an attempt to realize the desired processor 

inference 

Consider the measurements These measurements are evaluated 
as linear combinations of the concepts in a type-^ - 1 neural signal pro- 
cessor and thereby the decisions of a neural network of ^ - 1 layers. 
Noting that the measurement of a point x in the input manifold ^ is 
defined only when the decisions of every node in the £ — 1 layer net- 
work participating in the synthesis of le, the manifold induced in 
X by the distinct assignments to is an intersection of 
sion regions -the number of processing nodes in layer £, for all £j has 
been denoted earlier by rrn - and this region, if it exists, has a dimen- 
sion given by max{0, n — rnjzj), wherein the manifolds induced by the 
decisions taken in a network of f — 1 layers has been assumed to be 
n - L The manifold induced in X by assignments to 7 /^) being contin- 
uous unions of such regions, the implication of these manifolds being 
of dimension no larger than n - 1 is immediately seen. By an identical 
reasoning it can be established that the dimension of the manifolds 
induced in X by assignments to is constrained, on the upper side, 
to be one less than the smallest dimension of the manifolds induced in 
X by the participating decisions. 
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In the event the activation function is continuous, it is immediately 
apparent that the manifolds induced by are homeomorphic to (hy- 
per) planes of identical dimension: this aspect has been indicated in 
the illustration. However, when the activation function has regions 
of constancy- for simplicity I assume that such regions are observed 
in the activation functions of all processing nodes- the intersections of 
the manifolds of decisions in the network of £ — 1 layers will be convex 
regions, thereby, the manifolds induced in A' will be a chaining of such 
(local) convex regions, the chaining being along a manifold which is no 
different from that induced by measurements when continuous acti- 
vation functions are involved. The dimension of this union of convex 
regions is easily seen to be no less than n — 2 when all manifolds in- 
duced by decisions of the £ — 1 layered network, on which the realization 
of 7 /^^ is based, are of a dimension no less than n — 2. 

Conclusion 

A verification of the claim for type-1 neural signal processors and the 
assurance of the validity of this result for a type-f processor conditional 
on its validity for type-£ — 1 processors suffices to establish the stated 
claim. Note that this proof has not needed an explicit use of the node 
indices and the temporal index (, 

□ 


Note that in the above theorem the partitioning of the input mani- 
fold by measurement functions is of a var 3 dng (adaptive) nature when 
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recurrence or lateral interaction is involved. The past history of (in- 
termediate) concepts when incorporated in a measurement of incident 
concepts {ie, inputs) serves to revise the translation effected by the 
function b and, thereby, while the dimensionality of the members of the 
input space partitioning is unaffected, the manner in which partitioning 
evolves in neural signal processors that incorporate recurrence and/or 
lateral interaction. Evolution in the partitioning on the input manifold 
has been the basis (in schemes suggested in the literature) of incorpo- 
rating search in neural networks. In order to appreciate the nature 
of manifolds induced by the measurement functions of neural signal 
processors, the following is introduced (c/, Lawson, 1974; ltd, 1987). 

5.2.2 A foliation}^ of codimension q (alternatively di- 
mension p = 71 — q) on an ii-dimensional manifold M, 0 < g < n, is a 
family 'S = {La | n € of arc wise connected subsets, called leaves, of 

M with the following properties: 

(i) La n Lfto = 0 i/a 7^ a, a° G -4^. 

(ii) U La = M, 

ot£A:S 

^^More precisely, this is termed a codimension q class -foliation of M The notion 
of foliations, essentially complex geometrical structures, has also been considered by 
Lawson (1974) and ltd (1987) on structures other than manifolds 

A manifold is, roughly speaking, a space locally modeled on affine space, 
and a sub manifold is a subset locally modeled on an affine subspace In 
this spirit, a foliated manifold is a manifold modeled locally on an affine 
space decomposed into parallel aiOfine subspaces (Lawson, 1974). 



Section 5 2 Functional Nature of Neural Signal Processors 


263 


(Hi) Every point in M has a neighborhood U and a system of local, 
class C coordinates x = {xi,X 2 ,*.-Xn) : U ^ 3?’^' such that for 
each leaf Cc^, a E A;^, the components ofUOCa are described by 
the equations Xpj^i = constant, .,Xn— constant 

Every leaf of 3^ is an {n - q) dimensional sub manifold of M. A simple 
example of a foliation is the collection of (hyper) planes defined in an 
isolated neuron by an operation described in Equation 3.1a. Figure 5 3 
(p. 264) illustrates the notion of foliations. In the definition of foliation 
suggested by Lawson (1974) as well as ltd (1987), characteristics 
of the set -4^ which supports the indexing of leaves has not been men- 
tioned. As explained in the following, in the context of neural networks, 
this set is of paramount importance and I will refer to the set .4;^ as the 
stem^^ of the foliation 3- For convenience, the set will be considered 
to be of dimension g, the codimension of the foliation 3- The nature of 
partitions induced in the input space by the measurement functions rj 
motivates the following. 

Axoom 5 , 2.2 Axiom of Measurement 

A neural signal processor, through the measurement functions in each of 
the processing (decision making) nodes, induces a foliation, of codimen- 
sion at least one, in the input manifold. This foliation forms the basis of 
synthesizing (approximating) the desired level curves of the function. 

A more appropriate terminology for the set 4^ is stalk of a foliation However, the 
term ’stalk’ is used m the theory of sheaves (Tennison, 1975) to mean a subspace of the 
manifold and not an index set for leaves 
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Figure 5.3: Illustration of a foliation on a manifold 

Though, in this thesis, the foliation induced by isolated neurons is 
not considered to be more complicated than that of a partition through 
(hyper) planes, such a general geometric structure has been incorpo- 
rated into the axiom of measurement for the following two reasons. 


1. The polynomial neurons of Cover (1965), higher order neurons of 
Spirkovska & Reid (1992), morphological neurons of Davidson 
& Hummer (1993) and neurons with functional links (Pao, 1989) 
partition the input manifold in terms of (hyper) surfaces rather 
than (hyper) planes. Figure 5.4 illustrates the foliation induced in 
these non-conventional neurons: the foliation is considered on 3?^, 
a two dimensional space in which the input space is embedded.^® 


Figure 5.4 the foliation induced by polynomial neurons are indicated in (a) and (b), 
morphological neurons in (c) and (d), a higher order neuron in (e), a neuron with functional 
links in (f) and a neuron which imposes elliptical basis functions (a generalization of radial 
basis functions) in (g) 
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(g) 

Figure 5.4: Foliation on in non-conventional neurons: (a) 7j(x) = 
2 xi + 3x2 - 2 xiX2 , (b) T]{x) = ( 2 xi + 3x2 + 2 )( 3 xi - 8x2 + 
12 )( 7 xi - X2 + 1 ), (c) 77(x) = (2 A xi) V (5 A X2), (d) rj{x) = 
(5Axi)V(-2Ax 2), (e) 77 (x) = 2 xf +5x^4-3xiX2+xi+3x2, (f) 
7?(x) = st7t(xi) + tan‘^{x2), (g) 7?(x) = (.Ti + 2 )^ 4 - (x2 - 5 )^ 




266 


Chapter 5 Neural Signal Processing Architectures 


2. Though the the leaves of the foliation induced on an input man- 
ifold by isolated neurons of the type indicated in Equation 3.1 
(p, 110) are (hyper) planes, the nonlinear nature of activation 
functions (especially when sigmoidal functions are in use) imply 
that the leaves of foliations induced by processing nodes in layered 
neural networks are (hyper) surfaces, the 'curvature’ increasing, 
in general, with the depth of layering 

In a foliation on an input manifold, the leaves can have multiple com- 
ponents, each component is, in general, a connected region: Figure 5.4 
(a), (b) and (f) indicate leaves which have multiple components. The 
characterization of multi-layered neural networks (with hard-limiting 
activation function) provided by Lippmann (1987) is really a charac- 
terization of the leaves in the foliation on the space of input patterns 
induced by measurement functions. 

Leaves of the foliation in isolated neurons are convex regions with a 
single component (ie, (hyper) planes). The leaves in type-1 neural signal 
processors are piece-wise (hyper) planar with a single component that 
is, in general, not convex (however, the union of an uncountably large 
number of such leaves can still be convex) and the leaves in neural 
signal processors that are of a type larger than unity have multiple 
components, each component being piece-wise (hyper) planar. 

Piece-wise (hyper) planarity in the leaves introduces a 'curvature,' 
thereby, allowing the leaves of type-fc neural signal processors, for all 
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values of A:, A: = 1, 2, . , . , to be closed: the closed nature of leaves is 
related to the representation of local features. (Figure 5.4 (f) and (g) 
indicate leaves of foliation that have components that are closed.) Su- 
periority of approximation with non-conventional neurons, as claimed 
in the literature, is easily traced to the presence of multiple components 
and ’curvature' in the leaves of the foliation.^^ 

Measurements, in neural signal processors, provide the discrim- 
inants which, through operations of the same genre as comparison, 
form the basis of decisions presented, to the external world, as neural 
response. The discriminants are specified by the stem of the foliation 
induced on the input manifold by the measurement functions. Discrim- 
inatory functions which provide the mechanism of associating measure- 
ments on the incident inputs to neural action or categories, regardless 
of the specific details (see § 2.2 for different types of activation func- 
tions), serve to establish equivalences between distinct measurement 
values, thereby, associating distinct members of an input space with 
common clusters, the nature of the clusters is decided by the number 
of components and the nature of closedness in the leaves of the foli- 


Isolated neurons of the type indicated in Equation 3 1 (equated with processors capa- 
ble of realizing linear separable dichotomies), as shown in Chapter 3, represent a fraction 
of total number of functions possible on discrete input spaces, this fraction vanishes as 
the dimensionality of the input space increases. Contrasting this with the situation in an 
interconnected schema of Turing Machines wherein each processor is allowed to realize 
all possible functions on the discrete space, an enquiry on the nature of basic processing 
units necessaiy and/or suIBcient in an ensemble to accommodate a realization, with con- 
siderations of efficiency, of the desired (information) processing task is prompted. This 
enquiry, however, has not been included in the scope of this thesis 
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ation induced by measurement functions and the kinds of association 
between leaves through the activation functions. 

The esaential purpose of discrimination is to eirect a foliation of the 
input manifold on the basis of the foliation induced by measurements: 
the leaves of the foliation due to discrimination are unions of leaves of 
the foliation due to measurement. Equivalently discrimination effects 
a transformation between the stems of foliations without actually al- 
tering the system of local coordinates in the neighborhoods of points 
in the input manifold. Let ""5^ = | u G denote the foliation 

on the input manifold due to measurements and "^5 = | o G 

denote the foliation on the input manifold due to discrimination. 

Noting that the activation function establishes a transformation of 
Anx:^j the space of discriminants, into Ad:^, the space of decision labels, 
ie, a: A^n^ Ad^^ the leaves of the foliation due to discrimination 
are given, in terms of the leaves of the foliation due to measurement 
functions, as 


a' 


where, refers to the preimage of a G Ad^ in under the 

activation function cr. In this light the following characterization of the 
requirement of discriminatory functions, in addition to the specification 
made earlier in connection with representation potential, is of interest. 
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Ax 50 M 5.2.3 Axiom of Discrimination, 

A neural signal processor, through its discriminatory functions, renews 
the foliations, induced on the input space by the measurement functions, 
through a transformation, of the stems of the foliations, with at least one 
of the following properties: 

1, alter the indexing of leaves to retain distinctness in a finite non-zero 
number of local regions of the input space, 

2, introduce multiple components in the leaves, 

3, associate, to at least one component of a leaf of the foliation due 
to discrimination, uncountably many leaves of the foliation due to 
measurement 

Re-foliations provide the basis for establishing equivalences between 
members (elements) of the input space in ways not possible through the 
chosen measurement functions. 

As a reordering of foliations, ie a recreation of the partition on the 
input space, is the key objective of discrimination, it is essential that 
the functions that accomplish this task be different from linear: typi- 
cal choices for discriminatory functions incorporate reordering through 
comparison with one or more thresholding parameters. The role of dis- 
crimination is one of deciding on features provided by measurements 
and the functional nature of discrimination functions is to involve a 
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Figure 5.5: Illustration of admissible activation functions: 

(a) ( 7 ( 4 ) = (b) cr(4) = <7b(4) — tanh{4) — tanh{4 - 2), 

(c) = (J6(^ -f 4) -f 2ab{^ - 6), (d) a{^) = 0 

[-6, 4-6] , and cj(^) = otherwise, ^ E 5ft 


comparison of features discovered in the input patterns with templates 
of the features being tested for: this restricts the choice of activa- 
tion functions, in the context of measurements inducing foliations of 
codimension one, to piece-wise^® monotonic functions, each monotonic 
segment providing a graded comparison. Figure 5.5 illustrates a few 
examples of the admissible activation functions: in this illustration, 
the stem of the foliation due to measurements is identified with 5ft. 

^®The notion of piece-wise monotonic functions, though, vacuous in the sense that every 
function is expressible as an alternation of monotonically increasing and monotonically 
decreasing segments, is being used to explicitly indicate the alternating stretches of 
monotonic variation 
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The comparison is considered as representing operations of a senten- 
tial calculus of (an appropriate) logic incorporating uncertainty when 
the discriminatory function is continuous. Multiplicity in the number 
of monotonic segments introduces unconnectedness (due to multiplicity 
of components) in the leaves of the foliation induced, in the input space, 
by decisions and finiteness of the number of components assures that 
the re-ordering is not trivial. Discontinuities in the discriminatory (j'e, 
activation) functions, favored when crisp categorization is needed, are 
to be finite in order that the categorization problem be computable (see 
Hopcroft & Ullman, 1989 for the notion of computability). Represen- 
tation of crisp categories necessitate non-strict monotonic variation in 
the discriminatory functions (indicated by property 3 of the axiom of 
discrimination). 

Arguments in the literature related to the superiority of approx- 
imation and function realization provided by measurement functions 
that induce a foliation whose leaves are not linear subspaces of the 
input manifold and activation functions that incorporate locality, eg, 
radial basis functions {y{g^ = where is the template of 

the pattern being tested for), provide a characterization of the foliation 
induced by measurement and not that by discrimination noting that 
despite non-monotonicity in the activation function, the one-sidedness 
of the discriminants restricts the leaves of the foliation due to discrim- 
ination to have identical number of components as in the leaves in the 
foliation due to measurement. It can be conjectured, at this stage, that 
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a choice of activation functions that have more than one region of lo- 
cality, thereby inducing the leaves of the foliation due to discrimination 
to have multiple components, reduces the number of layers and the 
number of processing nodes required for a given realization problem of 
smooth functions. 

It is important to appreciate that concepts are aggregates of features 
(reflected by a discrimination of measurements on the input space), 
the aggregation process being dependent on the nature of the concept* 
ie, taxonomical or complexive. The objective of aggregation is the in- 
verse of measurement in the sense that while measurement derives 
features from input instances or examples, aggregation is to S 3 mthe- 
size responses from decisions. Inputs are of the same genre as their 
responses in the same way that pattern features are of the same genre 
as decisions taken on these features and, hence, the following aspect 
of aggregation, together with the preceding axioms, would provide a 
characterization of neural signal processors. 


Axdom 5.2.4 Axiom of Aggregation. 

A neural signal processor, through its aggregation function, synthesizes 
(or approximates) the level regions of processor response through a fo- 
liation on the Cartesian product of the stems of foliations on the input 
space due to discrimination. Concepts, in neural signal processors, are 
identified with the level regions of processor response. 
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Thus if, in a certain level of discrimination, the foliations induced by 

dll' ptoccHHoi'H luo donolt'd by | <i. <- --t',?.), ' 1>2,. ,//i, 

then the leaves of the foliation “iJ = | a' e on the space of 

input patterns due to aggregation are given by 

^a' e Aa^ , (5.6) 

where ^La’ denotes a leaf in the foliation, on Ad:^^ x ' ‘ ‘ 
and a = [ai , 012 , • • Note that in the foliation on the collection 

of input patterns, described through the foliation ^5 on the Cartesian 
product of the stems of the foliations on the input space due to dis- 
crimination, ^agr(= ^5;5:) rcfors to the collection of responses of the neu- 
ral signal processor From the arguments leading to Proposition 5.2.4 
ip, 255) it can be seen clearly that the foliation induced on the input 
space due to measurement functions operating on the responses of a 
type-fe neural signal processor, for any value of A:, A: = 1,2, . has a 
structure similar to that indicated in Equation 5.6. 

Figure 5.6 illustrates a foliation induced on the input space due 
to an aggregation of foliations due a discrimination of a partitioning 
(foliation) provided by (hyper) planes: the activation function has been 
assumed to be sigmoidal and the foliations due to discrimination are 
considered from two distinct processors. As indicated in this figure, 

* properties 1 and 3 of the activation function given by the axiom of 
discrimination together with the foliation on the Cartesian product of 
the stems of the foliations due to discrimination introduce a ’curvature’' 
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Figure 5.6: Illustration of a foliation due to aggregation 

in the leaves of the foliation on the input space due to aggregation even 
if the leaves of the foliation on the input space due to measurement do 
not exhibit any 'curvature.’ 

The preceding discussion, on the one hand, suggests the adequacy, 
for representation, of a processing scheme involving a layered organi- 
zation of weighted linear combinations of the responses of sigmoidally 
transformed weighted linear combinations of incident patterns (or in- 
termediate concepts), while, on the other hand, triggers the possibility 
of novel processing schemes. Note that the statement of adequacy has 
been, more formally, stated in Theorem 5.1.1 ip, 243). A suggestion of 
processing schemes alternate to that indicated in Equation 5.1 {p. 238), 
however, is not included in the scope of this thesis. 

Before closing this brief presentation of the functional nature of 
neural signal processors, I draw attention to the fact that the axioms of 
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neural signal processing are not restricted to only those stated in the 
foregoing. However, it is important to realize that these axioms char- 
acterize the information processing aspect of neural signal processor. 
The context in which such processing is handled needs to be specified 
by separate axioms that specify the choices to be made in the input 
pattern space (vector space, topological manifold, symbolic space, e^c), 
decision space (discrete or continuous) and space of concepts (numerical 
or symbolic). 

Such axioms have, intentionally, been left open to accommodate gen- 
erality in the conceptual framework of neural signal processors, though, 
in this thesis, I have been using the common choices (with minor gener- 
alizations) of an input space embedded in the n-dimensional space 5?"^, 
where n denotes the number of input channels, decisions restricted to 
a (compact) subspace of 3?^, where m denotes the number of decisions 
being sought on patterns from the input space X and a space of concepts 
(ie, responses of the neural signal processor) embedded in Si”', where m 
is the number of elements in the processor response g. All the spaces 
are considered as (metric/normed) vector spaces of appropriate dimen- 
sions: this choice stems from the present considerations of relevance in 
discussions of neural signal processors.^® 


more complete picture of the perspective and computational power (in comparison 
with other decision making frameworks like Turing Machines) of the paradigm of neural 
signal processors will be obtained when all the spaces are considered as varieties of 
category theory (Hartshorne (1977)), a topic outside the scope of this thesis 
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5.3 Operational Interpretation of Neural Signal 
Processors 

Neurons, and neural signal processors, till now characterized as opera- 
tors between finite dimensional spaces will, in the ensuing discussion, 
be extended to slightly general domains, specifically function spaces: 
consequently, extension of the neural processing paradigm to functional 
and operator spaces, and thereby, to the space of neural signal proces- 
sors, can be expected.^® This exercise is aimed at seeking a means 
for unifying existing neural processing architectures in terms of oper- 
ations familiar in the context of signal processing. For simplicity of 
presentation, only the formal neuron model corresponding to steady 
state solution under additive dynamics will be considered, noting that 
extensions to other categories of neural models follow similar reasoning. 

I begin with the functional structure of an isolated (hypothetical) 
neuron. The formal model of a neuron, as discussed in § 2.2, describing 
the steady state solution (when the input is unvarying in time) is given, 
with the usual conditions, by 

n 

y{x) = aC^WtXt - 6) = a{w-x- 6) . (5.7) 

«=i 

It is common knowledge that inner product operation is available over 
Euclidean spaces as well as function spaces, and this feature of the inner 

this thesis, however, the scope of the neural processing paradigm has been re- 
stricted to pattern spaces, with the associated interpretation of (in general, discrete) 
signal spaces Extensions to functional, operator (specifically, neural processor spaces 
A: =: 1 , 2, .), functor spaces and other varieties of category theory (Krishnan, 1981) 

will be reported elsewhere 
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product operation will govern the definition of neurons over function 
spaces Thus, the manifolds A' and y will, in the present article, be 
considered as function spaces and for reasons of distinction be denoted 
by X and 2} respectively. Similarly, weights are drawn from a function 
space denoted by 2U as distinct from W used in the case of neurons over 
Euclidean spaces. 


5.3.1 The formal modeP^ of an isolated neuron on a 
function space X weighted by a function in W is given 

7 /(t) = (u;( 7 ),.r( 7 )) (5 8a) 

y{x) = a{Tj{x)), (5.8b) 

where, lu € W and x € X are functions defined on the (entire) real 
number field, 7 € 3? is the continuous valued index of the functions x 
and w, 9 £ ^ is the threshold, and (•, ) is the inner product operation 
between functions: 


{w{y) ,x{'y)) — J d'yw{'y)x{y) , (5.9) 

(Note that as this discussion is based only on real valued functions x 
and w, complex conjugation in the inner product between functions has 
not been incorporated.) 

order to extend the formal model of neurons with dynamics to function spaces, we 
need to replace inner products between vectors by inner products between functions. 

^^Note that in neurons, and in neural signal processors, the measurement, discrimi- 
nation, and aggregation (not incorporated in neurons) stages are still operators, though 
between function spaces. 
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It is worthwhile to observe that the above definition subsumes the 
formal model of neurons on finite dimensional spaces noting the ele> 
mentary principle that functional inner product subsumes vector inner 
product. We note that the extension of neural definition to function 
spaces does not, however, improve the separation capability in compari- 
son with neurons defined on Euclidean spaces. Neural signal processors 
over function spaces are defined similar to those over Euclidean (pat- 
tern) spaces (see § 5.1 and § 5.2) with the specific difference that vector 
inner products are replaced by inner products between functions, and 
thereby inherit all the taxonomical, architectural, and representational 
peculiarities discussed earlier. The response of a type-fc neural signal 
processor will, for notational convenience, be continued to be denoted 
by I introduce the following in order to facilitate an appreciation 
of the operational character of neural signal processors 


Neural signal processors oftype-k, fc = 1,2, . , 
are defined by the (informal) operator equations 


{x,t) 




Pi-l 


ft 


(5.10a) 

(5.10b) 
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(5.10c) 

Et 

where, G is the index for the collection of decisions in layer 
i, 7(') G r(') is the index for the collection of (sub) concepts in layer 
4 ^ = 1,2, ,^k,r},t,a,a, and b have the same interpretation as in 

Equation 5.1, x ^ is the (real valued) input function, 6 is the (real 
valued) threshold function, y is an operator whose values are decision 
functions, t) is operator whose values are the neural signal processor 
responses, ie, 

^(0. jn(^) ^ ^ = 1, 2, . . . , k, 

and (x, t) = x,\ft e 3to,+- denotes the feed-through measurement 
kernel,"^^ *) ^ 2IJ, denotes the 'measurement kernel due to 

lateral interaction,' ^ and denotes the 'aggregation 

kernel,' •) 6 where W, €, and V, are collections of admis- 
sible weighting functions, = 

where, w, e, and v are weighting functions (or vectors) defined on lines 
similar to that of Definition 5,2.1, Concept definition spaces C = 
0, 1, 2, . . . , A;, and decision definition spaces E^^\ i=l,2,...,k, are, each, 
allowed to be either continuous or discrete (and finite). In case any of 
these spaces is discrete, integration over the corresponding spaces is to 
understood in the sense of summation, pw, p€, ct.nd pv, with appropriate 
layering indices, denote the measures with respect to which integration 


^^The feed-through measurement kernel is equivalent to the matrix W considered in 
the proof of Proposition 5.2.2 (p. 251) 
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is carried out The propagation delays r between processing nodes are 
assumed, for convenience, to be unity whde considering lateral interac- 
tion and zero for feed-through connections. 


Note that closure of neural signal processing over the space of con- 
cepts is implicit in the definition due to the assignment (x, t) = x, 
"it 6 5Ro,+ . The discussion presented in the foregoing, though initiated 
in the context of neurons defined on function spaces can be easily trans- 
lated to the case of processors realized on Euclidean spaces noting the 
conceptual similarities in the two cases: however, for reasons of con- 
venience, the translation to neurons on Euclidean spaces is not being 
presented. In the case of processors realized on Euclidean spaces (SR”), 
the integral transforms are of the discrete variety.^^ 

A few remarks related to the aggregation kernels are in order. If we 
appreciate the relationship between decisions and concepts and con- 
sider the possibility of deriving the (logically) necessary and/or suf- 
ficient decisions given concepts, then the process, essentially one of 
de-aggregation, is the inverse of measurement and neural signal pro- 
cessors, in this sense, associate the integral transforms of measure- 

Though, in all of the preceding discussion, the inputs have always been considered 
as linear arrays, this being generalized to functions, the operational aspect of neural 
signal processing is not restricted to such an interpretation By an appropriate choice of 
variables, the kernels of the integral transforms can be revised to have interpretations 
of processing multi-dimensional signal (images) without the need for an interpretation 
in terms of arrays of functions Such an interpretation would be essential in situations 
wherein the features and concepts have to interpreted as related to sub-images of the 
incident image. Such interpretations have been suggested in the literature. 
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ment and de-aggregation. However, as inverses of the integral trans- 
forms are not assured, and are not being imposed either, such a view, 
though accurate, would not be operationally feasible. For this reason, 
the paradigm of neural signal processing is being considered as non- 
linear associations between the integral transforms of measurement 
and aggregation. Neural signal processors between function spaces are 
characterized as in the following, 

JiKeorREM 5.3.1 

1. Measurement and aggregation operations in neural signal proces- 
sors are integral transforms. 

2. In type-1 processors, for all t € these transforms are 

linear. 

3. Measurement operations oftype-k neural signal processors, viewed 
with respect to the input, are non-linear integral transforms: non- 
linearity in the kernel is related to 'curvature* in the leaves of the 
foliation induced on the input function space. 

4. Activation functions impose point wise non-linear relationships 
between the integral transforms of measurement and aggregation. 

70 T'^kbooiem 5.3.1 Neural signal processors between fi- 
nite dimensional spaces are discrete (linear) integral transforms. 
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Though these statements are trivial consequences of the above defi- 
nition, it is important to note that the operational paradigm of represen- 
tation in neural information processing is as specified in the following. 

5.3.2 Information processing in neural signal processors is 
effected by point-wise nonlinear associations between (nonlinear) inte- 
gral transforms 


-+ Sfi 1 -^ y{t): 




y 




x{t): ^ A 


Neural information processing paradigm is captured effectively by 
the above display with the understanding that x{t), y (t), and tj(t) 
are spatially defined operators indexed in t e 5Ro,+ (commonly inter- 
preted as time), and for some a priori chosen i G {1,2, . k}, Kw'°^ (t) 

is the effective feed-through measurement kernel operating between 
the inputs and layer C, and is the effective aggregation kernel 

operating between layer £ and the outputs. This immediately allows 
an appreciation of neural signal processing in the light of conventional 
signal processing (see § 2.1). 

Neural signal processing relates to conventional signal processing 
in the following ways: 
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1. While both approaches are operationally rooted in the realization 
of processors as associations between integral transforms, the im- 
portant difference is that while conventional signal processing is 
based on a choice of integral transforms independent of the fam- 
ily of processors being realized and the synthesis of processors is 
guided by the choices on the mechanism of association between 
the integral transforms, neural signal processing considers the 
mechanism of association to be independent of the processor fam- 
ily, the realization being accomplished through a search for the 
appropriate kernels of the integral transforms, the latter aspect 
being appreciated as learning. This operational interpretation is 
strengthened by Kolmogorov's theorem on function representation 
(see § 5.4). 

2. Signal processing viewed as the distinct stages of feature extrac- 
tion, decision making and signal reconstruction in the conven- 
tional approach, it is imperative that the equivalent of aggrega- 
tion integral transforms be the inverse of the equivalent of mea- 
surement integral transform, ie, measurements are the same as 
de-aggregations: this requirement is related to the signals in the 
input and output space being of similar nature and interpretative 
content. Neural signal processing, on the other hand, is not re- 
stricted to this interpretation and, hence, it is not common to find 
the kernels of aggregation and measurement integral transforms 
being compared. It would not be incorrect to suggest that while 
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the 'basis' functions through which the desired processing func- 
tionality is realized is chosen to be invariant in conventional signal 
processing, these 'basis' functions are synthesized in accordance 
with the processing requirement, thereby instilling confidence in 
claims that the neural approach to signal processor realization is 
well suited as generic representational framework. 

3. In conventional signal processing, in particular, approaches that 
enjoy linearity in the constituent operations, the mechanism of 
association between integral transforms providing the key to pro- 
cessor realization, it is not uncommon to find this mechanism 
being identified as the transfer function between signals in the 
input and output space: the transfer function, in view of being 
an association between integral transforms, is invariably given 
an interpretation in the spectral domain. The transfer function 
approach is not sustainable in the neural signal processing con- 
text as the mechanism of association is, in general, invariant to 
the family of processors being realized and this mechanism is re- 
quired to exhibit a nonlinear dependence, of outputs on inputs, in 
order to facilitate categorization. 


5.4 Representation in Neurai Signai Processors 

Neural signal processors, independent of the t 3 q)e number, have been 
shown, in the foregoing, to approximate continuous functions and op- 
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erators on compact (measurable) spaces with any desired degree of 
accuracy This statement, applicable for functions and operators re- 
alized through static as well as d 3 mamic processors, is true, however, 
only when the discriminatory requirement Equation 5.4 {p. 246) is met 
by the activation function a at all the participating processing nodes. 
Activation functions that are smooth at the asymptotes and can be 
characterized as having alternating stretches of monotonic variation 
easily satisfy this discriminatory property: typical examples are the 
sigmoidal and Gaussian functions (see § 2.2), 

The theorems (see § 5.1) relating to the representation (in the sense 
of approximation) of continuous functions assure arbitrary degree of 
approximation accuracy only in the asymptotic situation, ie, approxi- 
mation error decreases as the number of processors increases towards 
00 , and remain silent on the character of representation when a finite 
number of processors are involved, and on the more important issue 
of specifying the number of processors required in a given processing 
context. Originally addressed by Lippmann (1987), Hecht-Nielsen 
(1987c) and Baum & Haussler (1989) in different contexts, with a 
follow-up by several others investigators, the problem of specif 3 dng the 
size of a network given a specific processing situation does not yet have 
a common consensus in the solution criterion, and the literature is pro- 
liferated with several ’rules of the thumb’, or heuristic approaches.^® 

is worth pointing out that the number of nodes in the ’hidden layer’ as a function 
of the ’dimensionality’ of input space (more accurately number of elements in the input 
patterns) are debated, quite occasionally in the Internet discussion forum supported by 
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Recalling, once again, the theorems of function representation in 
neural signal processors, the property of denseness in approximation 
is not restricted to a specific range of processor types, ie, for all k, 
fc = 1,2, . , Mt € 5Ro,+, is dense in the space of continuous func- 

tions. This means that from the point of view of function approximation 
without any restriction on the number of processing nodes, layering is 
insignificant in the neural processing paradigm. While this observation 
does not cause much consternation in feed-forward, non-evolutionary 
networks, an immediate implication in processors supporting evolu- 
tion in the response (state) through recurrence {eg, Hopfield circuit) 
or lateral interaction {eg, Kohonen layer), and enjoying the luxury of 
arbitrarily large number of discrimination nodes, is that the relevant 
attractor (or fixed point) should be reached in one computational step. 
Such a claim is not sustainable unless the class of processors under 
consideration is trivial, and, hence, it is imperative that a closer look 
be given to the idea of layering in neural networks. 

In the following, the issue of representation with finite number of 
processing nodes in layered neural networks will be in focus. The (finite) 
processing structures suggested by an application of a function repre- 
sentation theorem due to Kolmogorov (1957b) arc cursorily looked 
into. A proof of this function representation theorem due to Arnold 


the newsgroup Comp.ai neural-nets Dominant heuristics suggest that the number of 
(hidden) nodes in processors with a single decision layer (essentially members of ^91) 
be related to a mean value based on the number of inputs and output elements, both 
arithmetic and geometric means have been proposed. 
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(1958) justifies functions realized by neural signal processors to be char- 
acterized in terms of level regions, ie leaves of the foliation, on the input 
space. Noting that neural signal processors establish point-wise non- 
linear associations between (discrete) integral transforms, the kernels 
of these transforms are identified with the individual processing stages 
in the representation scheme central to this theorem. The discussion 
will be rounded up with remarks on the realization of the kernels. 

Kolmogorov (1957b) in a study of Hilbert’s 13th problem (c/, 
Lorentz, 1962) has shown that continuous real valued multi-variate 
real functions (on the unit hypercube £^”) have a representation of a 
form equivalent to 

271 n 

/fe) = ^ = 2,3, . (5.11) 

9=0 p=l 

where, the functions x and tt are continuous real valued real func- 
tions defined on = [0,1], and the choice of functions (each 

being monotone increasing, and ^ if (pi,^i) ^ (^ 2 , 72 ), 

PiiP2 = 1,2,.. n, , g 2 = 0,1,... 27i) is independent of the class of 71- 
variate functions {/} to be represented. The problem of representation 
is formulated as one involving appropriate choice of functions y given 
a specification of the particular function / to be represented.^® 

Observe that the problem of function representation considered by Kolmogorov 
(1957a) is very similar, in an operational sense, to that initiated by Rosenblatt (1958) in 
the study of perceptrons. The perceptrons of Rosenblatt are two-layered neural networks 
with weight adaptivity allowed only in the second layer The weights of the first layer 
are chosen to be appropriate for a ’problem domain’ and are independent of the specific 
processing function to be realized A similar correspondence has been established, in the 
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The similarity in the forms of computational specification in the rep- 
resentation suggested by Kolmogorov, and layered^"^ neural networks 
{iCy type-2 neural signal processors), has motivated Hecht-Nielsen 
(1987c), Kurkova (1992) and Kovacec & Ribeiro (1993), among 
others, to claim that multi-layered neural networks are capable of rep- 
resenting all continuous functions of interest. However, noting that the 
processing structure suggested by Kolmogorov recommends decision 
functions to be applied directly on the inputs Xp, for all stages 
q (with appropriate quantification on p, and g), ie, the incident in- 
put patterns are subjected to decision-making without any preprocess- 
ing, Kolmogorov's theorem is not readily applicable to characterize the 
input-output map provided by neural signal processors: this feature 
has prompted some investigators to comment that Kolmogorov’s theo- 
rem is not relevant to neural networks. I suggest an interpretation to 
the function representation theorem by Kolmogorov, in the context of 
neural signal processing, through the following theorem. 

TKeorRSM 5.4.1 Scalar processors in ^01(t), k = 1,2,. for all t G 
3fo,+^ with sigmoidal (or monotone increasing) activation function, hav- 
ing two or more nodes in the first layer of processing (ie, m\ = 2, 3, . . X 
and the feature vector restricted to a bounded (linear) subspace 

literature, between the function representation theorem of Kolmogorov and the CMAC 
architecture of Albus (1975). 

the literature, such networks are also termed as being three-layered when the 
input terminations are counted as a layer. The prevailing lack of consensus in numbering 
neural networks, and the compulsions for the present nomenclature have been indicated 
in Chapter 2. 
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isomorphic with for all x and t G represent 

continuous functions of the form 

2mi mi 
^=0 p=l 

where, functions tt satisfy the requirements stated in connection with 
Kolmogorov's theorem, and x depend on i). 

This interpretation assures that multi-layered neural networks with 
a finite number of processors, the number depending on the dimension 
of the initial {le, first layer) feature space, are capable of providing 
the desired function representation: in this light, the relevance of Kol- 
mogorov's representation theorem in neural signal processing is in the 
sense of a statement of representational complexity of concepts, given 
the features, rather than a suggestion for achieving the representation. 
However, unlike in conventional approaches to neural signal processing, 
the desired processing task is to be explicitly decomposed into distinct 
stages of feature extraction and discrimination. 

Concepts t), described on the input patterns x, are equivalently ap- 
preciated as mappings 5 on the initial feature vector and this 

interpretation, though useful in the hybrid approach to automated in- 
telligence is restricted to a discrimination on features extracted, from 
the incident signal, using a priori specified hypotheses. The above 
proposition suggests that, akin to the notion of preservance in pro- 
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cessors defined on discrete input spaces (see Chapters 3 and 4), the 
representation of the input space in the measurements ( 7 /^^), ie, input 
space foliation due to measurement functions, plays a crucial role in the 
representational nature of neural signal processors. While the above 
interpretation assures finiteness in the number of processing nodes, 
this cannot be used to decide the number of layers required in a neural 
signal processor. 

Similarities, of form, in the function representation schemes of lay- 
ered neural networks and that in the interpretation offered by Theo- 
rem 5.4,1 (p. 288) are tempting enough to associate processing stages 
X and TT with the components of neural signal processors. The function 
representation claims of Hecht-Nielsen (1987c), Kurkova (1992) and 
Kovacec & Ribeiro (1993), are, in fact, based on such a comparison 
of operational forms. While these claims have stressed on identif 3 dng 
X and TT with the activation functions, typically monotonic (as required 
by Kolmogorov’s proof), such identification can only be heuristic rather 
than a rigorous justification. 

Noting that neural signal processors establish point-wise nonlinear 
associations between integral transforms, it is interesting to identify 
functions x and tt in terms of the kernels of these transforms and ac- 
tivation functions of the processing nodes. This association is being 
sought from the point of view of understanding the nature of abstrac- 
tion involved in the mapping represented by 5 rather than correlate, 
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node by node, the two representation schemes. For simplicity of rea- 
soning, I will consider only feed-forward neural signal processors on 
finite-dimensional input spaces 


jnox 5.4.1 The x tt functions of Theorem 5.4.1 {p 288) 
are related to the aggregation kernels and Ki^\ respectively. 


T'Jiooy-: Noting that the responses of type-fc neural signal processors 
can be written as 

m*; mi 

it is not difficult, on a term by term comparison with the representa- 
tional form in Theorem 5.4.1 (p. 288), to see that the tt functions are 
linearly dependent on the activation functions a and that the x func- 
tions are nonlinearly dependent on cr, the nature of nonlinearity being 
decided by the kernel effective in the transformation of concepts 

realized by a single layer neural signal processor into measurements at 
the final layer of a type-A; neural signal processor. 

□ 


In order that the nature of the kernels effected in the transforma- 
tions across layers is understood, the following is introduced. (See 
Hazewinkel, 1988.) 



292 


Chapter 5 Neural Signal Processing Architectures 


5.4.1 A transformation T given by 

b 

T(x) = J K{t,s,x{s))ds 

a 

where K{t^s^u), a < t^s < b, -oo <t< 4-oo, is a function such that 

b 

g{t) = j K{t,s,x{.<i))ds 

a 

is continuous on [a, b], for any x{s) in C([a^ 6]), and is nonlinear in u is 
termed a nonlinear Urysohn operator mapping C([a, b]) into itself 

A discrete version of this operator can be immediately visualized. 
From the above definition and the preceding proposition the following 
is immediately evident. 

y^oioxosoTOOK 6.4.2 The kernels of measurement and the 

kernel of aggregation in type-k neural signal processors, ie, 
belong to the class of kernels of Urysohn operators. 

This statement implies that as a representational paradigm, asso- 
ciation between integral transforms, synthesized as the influence of 
multi-layered neural signal processors, is not vacuous. In addition 
the Urysohn-Brouwer Lemma (Hazewinkel, 1988), an assertion on 
the possibility of extending a continuous function from a subspace of a 
topological space to the whole space, in the context of an interpretation, 
in Chapters 3 and 4, of generalization as function extension, provides 
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the basis for an assurance that learning and generalization, the central 
issues of neural signal processing, can be easily incorporated in dis- 
cussions involving processors defined on abstract spaces: this thesis, 
however, being of limited scope, learning and generalization in proces- 
sors with abstract neurons will not be taken up. 

In § 5.2 I have suggested the plausible axioms for a discourse on 
representation with neural signal processors. These axioms provide 
a characterization of the admissible structure in the components of a 
neural signal processor, viz, the activation function and the kernels of 
the integral transforms of measureraent and aggregation. As stated 
earlier, the motivation for seeking the axioms of neural signal process- 
ing -fully recognizing the empirical nature of investigations in neural 
networks -is to provide a framework that would aid a unified approach 
in the understanding of the nature of representation in neural networks 
and related ’automata': the unification, however, has not been included 
in the scope of this thesis. 

Recall the axiom of discrimination and consider the activation func- 
tions, other than the sigmoid function (including hard-limiter), that are 
admissible. As suggested in the illustration in Figure 5.5 (p. 270), the 
admissible activation functions have one or more local ’bumps,’ Such 
activation functions have been suggested, in the literature, to be su- 
perior to the sigmoidal function for purposes of approximation. (See 
Poggio & Girosi, 1990, for claims regarding the superiority of reg- 
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ularization networks, also called radial basis function networks, over 
approximation through type-1 neural signal processors with sigmoidal 
activation function in all the processing nodes.) 

In view of Theorem 5.1.1 (p. 243) and Proposition 5.1.2 (p. 247), all 
activation functions admissible under the axiom of discrimination share 
the property that *91, the collection of functions realized in a type-^, 
fc = 1, 2, . . . , neural signal processor that incorporates these activation 
functions, is dense in the space of continuous functions. Geva & Sitte 
(1992) have suggested a constructive procedure for realizing local func- 
tions through a weighted superposition of (domain) translated sigmoid 
functions; the weight values are mutually negative. A similar scheme 
has been suggested by Zhang & Benveniste (1992) and Pati & Krish- 
naprasad (1993), however, in these schemes the linear combination of 
(domain) translates of the sigmoid function, together with a (domain) 
scaling and/or rotation, has been identified with wavelet transforms 
(Chui, 1992; Daubechies, 1992). These investigations motivate the 
following elementary statements.^® 


IP3ioyos3T30x 6.4.3 To every neuron defined as in Equation 5.8 (p. 277), 
where the activation function cr is continuous and satisfies the axiom of 
discrimination and the other symbols have the same connotations as 


^®Though the following statements are being made in the general context of neurons 
defined on function spaces, equivalent statements corresponding to neurons defined on 
EucUdeeui spaces is immediately evident. 
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in Definition 5.3 J, there exists a corresponding functionally equivalent 
type-l feed-forward neural signal processor. 


IPrROOS": This statement is, simply, a consequence of Theorem 5.1.1. 

Recall the processing model in Equation 5.8 {p. 277). 

7/(x) = (w(j),x(y)}-0, 

y{x) = cr(7?(x)) 

The approximability^® statement of Theorem 5.1.1 suggests that there 
exist real numbers a^, Wt and 6^, i = 1,2, ... 7 ??,, for some appropriate 
finite (positive integer) value of tti, such that 

in 

cr(0 = 

t=l 

where denotes the sigmoidal activation function. This implies that 
the response of the neuron is given by 

m 

y{x) = o(x) = Y^a^a,{^]i{x )) . 

t=l 

^®The notion of ’approximability’ is to be understood in the sense of computability 
(Hopcroft & Ullman, 1989) extended to the context of function realization Note that 
function realization is one of the valid instances of computation Denseness of the space 
of realized functions in the space of desired functions, as suggested by Theorem 5.1.1 
(p. 243) and Proposition 5.1 2 (p. 247), is really an assurance of the possibility (though, in 
an asymptotic sense) of realizing any function, in the collection of desired functions, with 
arbitrary accuracy. Approximability is not the same as computability, however. The dif- 
ference between these two notions arises due to the fact that no restriction of finiteness 
of the number of computational steps, ie, component functions, is assured by approx- 
imability, whereas computability necessitates finiteness in the number of computational 
steps Further, no component function participating in the approximation is assured to 
be computable. 
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Since that the above expression is the operational model of a type-1 
neural signal processor with a discrete aggregation kernel, the state- 
ment is established noting that the equality between y and o is in the 
sense and the measurements 7?,(a;) represent the following evaluation: 

rii{x) = Wi{w,x) — Wtd - 6t,i = 1, 2, . . .m 


□ 

eo3iO£.£..43iy TO 1P310T0S3T30N 5.4.3 To every type-k, A; = 1, 2, . . , neu- 
ral signal processor wherein the activation functions are continuous 
and satisfy the axiom of discrimination, and the kernels of the integral 
transforms of measurement and aggregation are linear, there exists a 
corresponding functionally equivalent neural signal processor, of type 
no more than k, wherein all activation functions are sigmoidal, the 
kernels of the integral transforms of measurement and aggregation are 
linear and the complexity of evolution, if any, is unaltered. 

The above corollary to Proposition 5.4.3 follows from the proposi- 
tion of type number additivity (Proposition 5.2.2 ip. 251)) and Propo- 
sition 5.4.3. Note that in the above Proposition and its corollary, the 
measurement kernel of the type-1 neural signal processor, correspond- 
ing to every processing node, is made of weighting functions that are 
linearly dependent on each other. This situation, wherein the weight- 
ing functions (weight vectors) of distinct nodes (of the equivalent neural 
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signal processor) are all in the same 'direction' but have different norms, 
is complementary to that investigated in Chapter 4 in connection with 
neural signal processors that are realized with preservance weights in 
the first layer. 

In Chapter 4, the weights of distinct nodes, in the first layer, are re- 
stricted to have different 'directions' but the same norm so that a choice 
of preservance weights corresponding to the (preservance) input space 
in all the processing nodes is still capable of maintaining distinctness 
of weights in the distinct nodes. Contrasting this, a representation of 
the activation function in the space of sigmoidal functions has necessi- 
tated the weights of distinct nodes to be in the same 'direction' but have 
different norms. 

Consider the case when a neural signal processor of the kind sug- 
gested in Proposition 5.4.3 (p, 294) is operative on a (discrete) preser- 
vance input space of the kind studied in Chapter 3 with w, assumed 
non-null, is the associated preservance weight. This processor is func- 
tionally equivalent to an isolated neuron whose activation function is 
continuous, different from a sigmoidal function, and satisfies the axiom 
of discrimination. Note that w provides the common 'direction' for the 
weights of the processing nodes in the type-1 neural signal processor. 

In Proposition 3.2.13 (p, 152) I have shown that for every preser- 
vance weight of a discrete (preservance) input space there exists at 
least one other weight (that is not the negative of w) which is equivalent 
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to w, in the sense of a representation of functions on the (preservance) 
input space. An immediate implication of this equivalence and the na- 
ture of representation in neural signal processors is that the neural sig- 
nal processor suggested by Proposition 5.4.3 suggests a representation 
scheme that involves superpositions of functions on discrete spaces that 
are permutations (including scaling) of the preservance input space 
Put differently, a superposition of functions on discrete spaces that are 
permutations (with scaling) of a preservance input space is equivalent 
to a (type-l) neural signal processor operating on the preservance input 
space with non-sigmoidal, continuous activation functions that satisfy 
the axiom of discrimination. 

Other than the axiom of discrimination, the axioms of measure- 
ment and aggregation state, indirectly, the structural requirements on 
the kernels of the integral transforms of measurement and aggrega- 
tion, ie, the weights associated with the distinct nodes in the neural 
signal processor. The influence of the axiom of discrimination (and 
preservance) on the choice of weights (and, thereby, the kernels) was 
motivated by a realization of the activation function in a neural sig- 
nal processor. If instead, the measurement functions, 7/, are realized 
in a neural signal processor, the following is easily established: the 
proof of these statements is similar to that for Proposition 5.4.3. In 
the following statements, though the kernels of measurement integral 
transforms are assumed to be non-linear (the measurement functions 
are assumed to be polynomial discriminants), the kernels of the integral 
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transform of aggregation will be assumed to be linear. The processors 
are all assumed to be non-evolutionary. 

5.4.4 To every homogeneous polynomial discriminant, 
expressed compactly as a product of linear discriminants, 

3 

for every value ofj, j = 1^2, , , e (9* € 5R, z = 1, 2, . . . j, there exists 
a corresponding functionally equivalent type-1 neural signal processor 
with sigmoidal activation functions and linear discriminants. 


Go:RO>C£r,/rjiy o-o 5.4.4 Tb every high-order neuron de- 

fined as 

vix) = n 

j=i t,=i 

y{x) = cr(7j(x)), 

for every value of N, N = 1,2,..., Wi, € W, i, = l,2,...i, j = 

1,2 , there exists a corresponding functionally equivalent type-1 
neural signal processor with sigmoidal activation functions and linear 
kernels of the integral transforms of measurement and aggregation. 

An incorporation of the representations of the measurement func- 
tions rj and the activation functions cr, in terms of type-1 neural signal 
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processors involving linear combinations of sigmoidal functions operat- 
ing on linear discriminants, immediately suggests a processing scheme 
similar to that described by Equation 5.11 (p. 287), le 

Ni N2 

O(^) = ^ - Ot^) - 6j) Vx € X, (5.13) 

tj=i 

the equality is in the sense. In this scheme, the coefficients a and 
p and thresholds 0, with appropriate suffixes, are real numbers and 
the weighting functions to are members of the weight function space 
W. Ni and N 2 are finite positive integers which suggest the number of 
components that participate in the representation (approximation). 

Recursing into Theorem 5.4.1 (p, 288) with the attention restricted 
to the variant of Equation 5.13 for neural signal processors defined on 
Euclidean spaces, ie 

Ni N 2 

ofe) = ^ ^ O', x-9,^) -Oj) X, (5.14) 

j=i ij=i 

it is easy to observe the follo'wing. Representation in neural signal 
processors, regardless of the type of activation functions and kernels 
used for the integral transforms of measurement and aggregation, is in 
the sense of a realization of the desired function as a linear combination 
of ’basis’ functions, these ’basis’ functions themselves being synthesized 
as sigmoidal transformations of the response of a type-1 neural signal 
processor. (The synthesis of ’basis’ functions has already been related 
to the realization of the class of Urysohn operators.) 
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Note that the inner representation in Equation 5.13 and Equa- 
tion 5.14 refers to a realization of the measurement functions 77 due 
to nonlinear measurement kernels and the outer decomposition incor- 
porates the synthesis of activation functions that satisfy the axiom of 
discrimination through superpositions of sigmoidal transformations of 
weighted measurements. The weights (functions ), coefficients 
a^^ and thresholds reflect the representation of the measurement 
functions and influence the tt functions of the network structure sug- 
gested by the function representation theorem due to Kolmogorov 
(1957a). In a similar way, the coefficients f3j and thresholds 9j incorpo- 
rate aggregations of neural decisions on measurements (on the incident 
patterns or signals), the neural synthesis of the decision mechanisms 
(activation functions) and influence the \ functions of the network struc- 
ture in Theorem 5.4.1. 

Borrowing the interpretations of the x tt functions in the rep- 
resentation scheme of Theorem 5.4.1, it is evident that the role of ev- 
ery neural signal processor is to realize the given (signal) processing 
functionality by synthesizing 'basis’ functions that, in turn, reflect the 
synthesis of the required mechanisms of measurement and decision- 
making: these 'basis' functions operate on 'features' derived from the 
incident pattern (signal). The architecture of the neural signal pro- 
cessor required for a given (signal) processor realization is therefore to 
be decided on the kind of measurements to be taken on the incident 
patterns (signals) and the type of decisions to be effected on the mea- 
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surements. In this thesis, however, the specific aspect of (automated) 
procedures for guiding a selection of architecture is not attempted 

The above discussion on the influence of the axioms of neural sig- 
nal processing on the selection of the architecture of the neural signal 
processor is, in essence, an existential rephrasing of the constructive 
approaches suggested by Geva & Sitte (1992), Zhang & Benveniste 
(1992) and Pati & Krishnaprasad (1993). However, the implication 
that a type-2 neural signal processor with sigmoidal activation func- 
tions and linear discriminants is adequate for representing all func- 
tions realized by the entire class of (non-evolutionary) neural networks 
(and neural signal processors) described by the axioms of neural signal 
processing should not be missed. This implication has been the basis 
of the claims of Hecht-Nielsen (1987a), Kurkova (1992), Kovacec 
& Ribeiro (1993) and Lagunas, Perez-Neira, et al (1993). 

Recall Equation 5.14 (p. 300). In this equation, a type-2 feed-forward 
neural signal processor functionally equivalent to some neural signal 
processor (of the non-evolutionary kind) described by the axioms of 
neural signal processing, if the weights are restricted in such a way 
that = Wi, for an appropriate composition of the indexing variable 
tj in terms of the index variables i and then the x and tt functions 
denote the following monotonic evaluations. 


Xj( ) •— (* ) > 
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7/z, U) = -X. 

(Refer Theorem 5.4.1 with an appropriate change of variables.) 

In the function representation scheme based on a solution, by Kol- 
mogorov, of Hilbert's 13th problem, the tt functions are required to 
be monotonic and independent of the family of functions being rep- 
resented. Dependence on the functions represented is effected only 
through the x functions. This implies that the representation of mea- 
surement functions is independent of the family of processors being 
realized. As a consequence, while the features, are not restricted 
to be independent of the specific processor being synthesized, the pre- 
liminary evaluation, or preprocessing, of the features is required to be 
independent of the faimily of processors synthesized. While this inter- 
pretation is not new in the domain of conventional signal processing, an 
implication of such an interpretation arising in the context of connec- 
tionist signal processing is that the axioms of neural signal processing 
under the earlier stated notion of representation (ic, function S3aithesis 
in a 'basis' which is itself synthesized as a sigmoidal transformation of 
the response of a neural signal processor) suggest the possibility of a 
mediation between the notions of (and approaches to) representation in 
traditional (symbolic) and connectionist approaches to AI. 

Learning, in the context of neural signal processors that are inter- 
preted as point- wise nonlinear transformations between integral trans- 
forms, is a process involving a specification of the kernels of the integral 
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transforms. Though it is feasible to formulate learning as a search, 
through an appropriately stated gradient descent, for an appropriate 
kernel, such an approach is not considered in this thesis. Instead, an 
alternate approach is suggested wherein the kernel, recognized as a 
function on a suitable subspace of , for an appropriate value of tlk, 
is synthesized through a neural signal processor. (A characterization of 
kernels if (^, 7 ) as functions of two points was discovered by J. Mercer 
in 1909. See Aronszajn, 1950, for this historical aspect.) 

Note that in a type-k neural signal processor, k = 1,2, . . the mea- 
surement kernels and are functions on x where 

SW, £ z= 1 , 2 , ... ^, is the index set for the collection of decision nodes 
in layer £ and £ = 0,1,2, ..k, is the index set for the collection of 
concepts (inputs when £ = 0 ), ie responses of the neural signal proces- 
sor, in level £. Similarly, the aggregation kernels Kv\ I = 1,2,.. .k, are 
functions on x . When the index sets of the collection of pro- 
cessing nodes as well as the concepts are chosen to be one-dimensional 
and identified with a subset of !ft, then all the kernels are functions on 
appropriate subsets of 

A synthesis, or design, of the kernels of integral transforms of a 
neural signal processor through an appropriately chosen neural signal 
processor necessitates examples of association between the domain and 
range of the kernels, ie, a training set that effectively is an a priori spec- 
ification of the weights to be identified with some of the channels in the 
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layer corresponding to the kernel under consideration. Such a priori 
information about some (or all) of the weights of a few of the processing 
nodes in each layer is available when the structure of the networked 
ensemble is partially known, as eg, in the case of hybrid neural net- 
works. The approach of realizing kernels of integral transforms of a 
neural signal processor in the neural signal processing paradigm will 
be taken up in Chapter 6. 


5.5 Summary 

Neural signal processors, defined to be members of a typed class of 
abstract dynamical systems— the type number specifies the degree of 
association in the sense of layering -have been shown, recognizing the 
functional association to be time-indexed statements of spatial correla- 
tion in the incident input patterns, to represent continuous functions 
with arbitrary accuracy when the ketivation functions are continuous 
and non-constant. Denseness in representation is independent of the 
degree of association. 

The principal focus in the study of the functional nature of neural 
signal processors has been to formulate the axioms relevant in neural 
signal processing: these axioms are listed below. 

1. Axiom of Organization. 

A neural signal processor is composed of (layers of) three opera- 
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tional stages: measurement, discrimination and aggregation in 
that order. Preprocessing, if any, (preceding, or incorporated in, 
the measurement) is sought to be represented in a neural basis. 
Measurements are effected on an observation space constructed as 
the Cartesian product of the input space and a relevant subspace 
of a union of the space of responses of the distinct layers. 

2. Axiom of Measurement. 

A neural signal processor, through the measurement functions in 
each of the processing (decision making) nodes, induces a foliation, 
of codimension at least one, in the input manifold. This foliation 
forms the basis of synthesizing (approximating) the desired level 
curves of the function. 

3. Axiom of Discrimination 

A neural signal processor, through its discriminatory functions, 
renews the foliations, induced on the input space by the mea- 
surement functions, through a transformation, of the stems of the 
foliations, with at least one of the following properties: 

(a) alter the indexing of leaves to retain distinctness in a finite 
non-zero number of local regions of the input space, 

(b) introduce multiple components in the leaves, 

(c) associate, to at least one component of a leaf of the folia- 
tion due to discrimination, uncountably many leaves of the 
foliation due to measurement. 
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Re-foliations provide the basis for establishing equivalences be- 
tween members (elements) of the input space in ways not possible 
through the chosen measurement functions. 

4. Axiom of Aggregation 

A neural signal processor, through its aggregation function, syn- 
thesizes (or approximates) the level regions of processor response 
through a foliation on the Cartesian product of the stems of fo- 
liations on the input space due to discrimination. Concepts, in 
neural signal processors, are identified with the level regions of 
processor response. 

It is important to appreciate that these axioms state the functional 
characteristics of the distinct components of neural signal processors 
and do not, in any sense, imply the specific details of the constituents of 
the processors: specification of the constituents, essential in the design 
of neural signal processors, will have to be addressed by the constraints 
imposed by the specific function representation problem at hand. Con- 
sidering the interpretative scope of the term artificial neural networks 
(synonymous with connectionist information processing), discussed in 
§ 1.1 (p. 5), the above axioms provide a pointer to the philosophical 
foundation of neural information processing. 

Operationally, neural signal processors have been shown to effect 
(point- wise) nonlinear transformations between integral transforms: I 
suggest this operational character to be the representational paradigm 
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of neural signal processing. Neural signal processors, in this interpre- 
tation, are compared with the conventional approach of realizing signal 
processors: the salient aspect of this comparison is that function repre- 
sentation in neural signal processors is attempted by a process involving 
a search for kernels of integral transforms appropriate to the desired 
processor described through examples, the mechanism of association be- 
tween the integral transforms being independent of the processor family, 
while the same is effected in conventional signal processing through 
an identification of an association appropriate to integral transforms 
evaluated independent of the family of processors. 

Features in the patterns presented to a processor, as understood in 
information processing contexts grounded in the current understand- 
ing of (human) perceptual abilities, are linked to integral transforms, 
the rationale being that kernels of integral transforms provide a tem- 
plate of the features, possibly known a priori, being discovered in input 
patterns. The representational paradigm of neural signal processing 
relates to the components described by the axiom of organization in the 
following sense. Measurements incorporate an extraction of features in 
the presented patterns, discrimination formalizes the decisions -in the 
nature of predicates of an appropriate mode of logic -taken on features 
and aggregation allows a synthesis of concepts through decisions. 

Approximation being the methodological basis of function represen- 
tation in neural signal processors, Kolmogorov’s theorem on represen- 
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tation of multi-variate functions has been interpreted in the context of 
neural signal processing as being a characterization of the representa- 
tional complexity of given concepts. The constituent functions of the 
representation in neural signal processors derived as an interpretation 
of Kolmogorov’s theorem on function representation have been related 
to the kernels of aggregation, these kernels being in the class of kernels 
of nonlinear Urysohn operators. A few representational features of ar- 
chitectures based on the axioms of neural signal processing have been 
studied with the help of the function representation scheme suggested 
by an interpretation, in the context of neural networks, to Kolmogorov’s 
theorem on function representation. 

The representational nature of neural signal processors being one 
of inducing a foliation in the input space and the foliation, due to mea- 
surement, being effected through integral transforms it is of interest to 
know the nature of predicates operating on the input space. In partic- 
ular, the possibility of localization in the predicates and, thereby, the 
concepts and the nature of localization in function representation is im- 
portant in a study of neural signal processors. Localization in function 
representation becomes important as distinguishability between leaves 
is restricted to local regions by the axiom of discrimination. Chapter 6, 
the penultimate chapter of this thesis, will focus on the above mentioned 
issues of neural signal processing. 
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It is common practice to use certain local operators to prepro- 
cess patterns prior to their recognition. The best known of 
these is the smoothing operator . . A neighbour set of numbers 
is summed and the sum compared with a . . threshold. If the 
sum is equal to or exceeds the threshold, a one is passed on 

to the next stage of processing , [else] a zero is passed on 

. . . The process is repeated over the entire field of the pattern. 
This type of operator can be used to fill gaps in line patterns, 
to thicken lines, or to remove small irregularities. It is usual to 
take a symmetrical . . operator . [The] directional properties 
that it has stem from the intrinsic [symmetries in the operator]. 

— Michael J B Duff 

Parallel Computation in Pattern Recognition^ 
in Methodologies of Pattern Recognition, 
edited by Satosi Watanabe 
Academic Press, New York, 1969 
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Neural information processing, as a paradigm of representation, is a 
statement of nonlinear association between integral transforms, as in- 
dicated in the previous chapter and the key departure of neural signal 
processing from classical approaches is in the incorporation of nonlin- 
earity, in general, in the measurement and aggregation kernels, and in 
effecting a nonlinear transformation between the integral transforms 
related to measurement and aggregation: these notions, though, not 
u nkn own in the literature of signal processing, are not predominant 
due to anal3d;ical intractability. The integral transforms reflect the 
essence of signal representation and nonlinear association effected by 
activation functions is helpful in appraising the interplay, aided by lay- 
ering, between signal and system* representation. 

While in conventional signal processing, the kernels are chosen to 
be independent of the signal class under investigation and the focus is 
to study the nature of association required between integral transforms 
of the input and output signals in order to realize the desired proces- 
sor functionality, the approach in neural information processing is to 
choose the kernel with the (point-wise) nonlinear association between 
integral transforms being independent of the processor class. Indeed, 
the kernels of the integral transforms of measurement and aggrega- 
tion, referring, indirectly, to the strength of interconnections between 
processing nodes in the network, form the parameters through which 
the effective processing functionality is understood and s3mthesized. 


'The reference is to information (signal) processing systems 
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An immediate consequence of this complementarity- the choice of 
association in conventional signal processing and that of kernels in 
neural signal processing-in operation between the two approaches to 
signal processing is that invertibility^ of (integral) transforms and sig- 
nal reconstruction which are of paramount importance in conventional 
schemes (based on transform domain techniques) do not find the same 
prominence in the neural paradigm. While these issues are not alto- 
gether irrelevant to neural information processing, at an operational 
level, the neural paradigm is not contingent on either invertibility of 
integral transforms, or the need for ensuring signal reconstruction. 

The aggregation integral transforms are not restricted to be inter- 
preted as the inverse of the integral transforms of measurement, and 
it is imperative to appreciate the vast degree of freedom in proces- 
sor design ensuing as a consequence of this generalization. Though 
concepts represented at the outputs of neural signal processors belong 
to the same genre as the concepts (patterns or signals) presented as 
inputs or those in the intermediate levels of processing, as indicated 
while motivating the axiom of aggregation in § 5.2, it is important 
to note that the character of concepts, measured through represen- 
tational complexity suggested by the interpretation to Kolmogorovas 
theorem in Theorem 5,4.1, is, in general, different at different levels 
of processing, thereby, providing ample scope for an exploitation of the 
representational freedom ensured by the aggregation kernel that differ 


^This is a methodological requirement in conventional signal processing. 
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from the inverse of the measurement kernel* in studies of psychology, 
the complexity of concepts is considered to increase with the degree of 
association. 

Despite these significant differences between the classical and neu- 
ral approaches to signal processing, it should be noted that localization 
of evaluation, central to signal processing having connotations of fea- 
ture extraction and evaluation of features, is still evident in neural 
signal processors. In this chapter, the nature, and influences of local- 
ization in neural signal processors will be studied. For simplicity in 
understanding, the discussion will begin by considering the sense in 
which isolated neurons effect a localization in their evaluation, and 
this study will be continued to appreciate the nature of localization in 
more general (layered) neural signal processors. 

Characterization of localization and the implications of localization 
in the realization of signal processors with neural networks will form 
the focus of study in the latter part of the discussion. Predicates realized 
through neural signal processors are shown to have localized influence 
and through this have been related to window transforms: on the basis 
of the mechanism of localization, predicates are divided into Intra- 
pattern’ predicates and ’inter-pattem’ predicates. Concepts are shown 
to be represented in neural signal processors as localized regions in the 
’sheaf of input patterns/ 
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While the operational aspect of neural information processing is eas- 
ily studied, in signal processing terminology, as nonlinear associations 
between integral transforms whereby the focus of processor represen- 
tation reduces to a judicious selection of kernels for measurement and 
aggregation integral transforms (as the mechanism of nonlinear associ- 
ation is invariant to the processor class), the central problem of neural 
signal processing, inspired by processes of perceptual relevance, is con- 
sidered to be the incorporation of knowledge of processing fimctionality 
available through examples of input-output association. Processor rep- 
resentation through kernel selection while implicitly, though indirectly, 
admitting learning by examples,’ is more general, and accommodates 
the possibility of specifying kernels based on qualitative information 
about the class in which the processor belongs. 

In § 6.1 1 initiate a discussion on the nature of localization in isolated 
neurons wherein the weights (equivalents of kernels in isolated neu- 
rons), of processing nodes defined on function spaces, are established to 
be window functions in order that representation is non-trivial. § 6.2 
ip. 322) is a study on the representation of localization in neural signal 
processors: localization related to the directional derivatives of activa- 
tion functions that are window functions is in focus. A characterization 
of localization through window transforms and the implications of local- 
ization on processor representation have been studied in § 6.3 (p. 332). 
This chapter concludes with a cursory look, in § 6.4 (p. 344), into the 
influence of kernels on representation potential. 
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6.1 Nature of Localization in Isolated Neurons 

Isolated neurons, imposing categorization on pattern spaces, are for- 
mally described as (see Chapter 2 for notations) 

V{x,t) = a{i]{x,t)){b{7]{x,t)) + wxl, (6.1a) 

y{x,t) = (T{ri{x,t) - 6) . (6.1b) 

An extension of this formal model of neurons to incorporate decisions on 
function spaces is based on an alteration of the first of these expressions 
to the form (as indicated in § 5.3) 

Vix, t) = a{i]{x, t)) [b{n(x, t)) + {w, a;)] , (6.2) 

with the added stipulation that w, and x are functions (on K);® a, b, 
and cr, while enjoying the same interpretation (and range spaces) as in 
the case of neural decision elements on Euclidean pattern spaces, are 
defined so as to be compatible for decision making on function spaces. 
The decision y is still a scalar, and decisions are taken on functions (x) 
that are possibly evolving in time (denoted by t), the dependence of x 
on t, however, is not explicitly indicated anywhere. 

In the ensuing discussion, the only demand that will be made on the 
functional structure of a is: 

Urn cr(^) = C+, Urn a(0 = C-, 

^ 4-00 i—* — oo 

^In the case of neural decision elements on function spaces, w and x belong to a Hilbert 
space of appropriate dimensions. 
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or, if hm cr(0 = Jim cr(^) = ^ G [C-,C+] then 

€ (~oo, +oo) such that cj(^) ^ C- 

This structure allows the decision space y to be, minimally covered 
by, any compact, simply connected subspace of JR: thus a is unique up 
to isomorphisms (eg, linear, invertible transformations) mapping, say 
[-1, 1], to 3^. For the sake of preciseness in argumentation, consider the 
following- 

6.1.1 A (decision) function f,^—^yis said to be trivial 
if it is a constant almost everywhere [ae], or if it is undefined ae. 

This definition, similar to the notion of trivial dichotomies defined 
in Chapter 3, allows the following characterization of function repre- 
sentation in (isolated) neurons: it is of interest to contrast the ensuing 
statement with the notion of preservance weights introduced in Chap- 
ter 3 (§3.1). 

6.1.1 For any non-trivial (decision) function that is to 
have a representation in an isolated neuron, with 9 eU, 

(a) the weight vector (w) of the neuron (inducing decisions on (Eu- 
clidean) pattern spaces) should be in P (in general, in p = 

(b) the weight function (w) of the neuron (inducing decisions on func- 
tion spaces) should be in (3?) (in general, in L^(3?), p = 2, 3, ), 
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(a) By definition, ||t£||p = p = 1,2, , and thus w E if 

1=1 

Vi |t(;a| < 00, 2 = 1,2, .71 (n finite), ie, all elements of w are 

finite. If one or more elements of w are non-finite, then the in- 
nerproduct wxis either -foo, -oo or undefined, for all x e P 
such that llrrll ^ 0 and Xt ^ 0 whenever ^u^ ^ 9?, i = 1,2, . n 
Since the number of points x e where W'X is neither +oo, ~oo 
nor undefined is almost countable for finite 7i, wx — +00,-00 
or undefined ae. For any 0 E K, le, \0\ < 00 , wx- 9 is either 
+00, -00 or undefined. If denotes the exception set, ie, = 
{xe£^\ w-x is neither +00,-00 nor undefined} then Vx E \ Cyj 
W'X and, hence, wx- 6 equals c, where, c is either +00, -00 or 
undefined, which implies that p(^) = c for all x E \ eiy, where, c 
is either +1, -1 or undefined, noting that a evaluated on an unde- 
fined domain point leads to an undefined range point. Thus if one 
or more elements Wt are non-finite, then the function represented 
is trivial, thereby refuting the negation of the statement. 

(b) This component of the Proposition is similar to the finite dimen- 
sional case, (a), except that functions and the function space (K) 

in place of vectors (sequences) and the sequence space and, 
hence, a separate proof is not being provided. 


□ 
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The finite dimensional vector x and the function x are finite energy 
patterns, re, x e and x G and the above Proposition stresses 

onw.xe and w, x G (K) to ensure that the innerproducts w ^ and 
(?/;,;r) arc in (through Cauchy-Schwartz inequality), a necessary (but 
not sufficient) condition for the admissibility of representation of non- 
trivial functions in isolated neurons. I will now state a theorem from 
the theory of Fourier Transforms (see Chui, 1992 for a proof) which will 
be utilized occasionally. The operator of ordinary differentiation (of a 
function), with respect to the independent variable, is denoted by D. 

TrKEorReM 6.1.1 If f e (3^) then its Fourier transform f satisfies: 

L f G L-(K) with m 

2. f is uniformly continuous on 

3. if the derivative Df off also exists and is in L^(SR) then 

4. f{w) as ui ±oo (Riemann-Lesbegue Lemma), and 

5. if f £ in addition, then f e L^(5R) and ||/||£, 2 (k) = 

(Parseval identity). 

As a consequence of this theorem the following dual holds, the proof 
of which follows exactly on the lines of that for Theorem 6.1.1 and, 
hence, has not been included. 
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6.1.2 Ifwe L^{^) then the inverse Fourier transform 

w satisfies: 

1. tu is uniformly continuous on 3?, 

2. if the derivative Dw of xv exists and Dxu G L^(5R) then Dw{r) = 

and 

3. %u{t) 0 as t ±00. 

The hypothesis w e (5ft) is a necessary condition for lu to be re- 
constructed from w using the Fourier inversion rule:^ thus lu G ( 5 ft ) 
is assumed. If in addition^ xv G jL ^( 5 ft ) then, through Parseval identity 
and the above Proposition, xu G I/^(5ft), xu is uniformly continuous on 5ft, 
and ty(r) — ► 0 as r — >* ±oo, iCy xu is a localized function with vanishing 
asymptotes, which motivates the following. 

6.1.3 Ifw G (9f) n L2 ( 5 J) and w eL^{^)n Dw 

exists and is in ( 5 ft ) fi ( 5 ft ), then the function tw{t) G Lf ( 5 ft ). 

+ 0O 

Under the assumed hypothesis, f due^'^^Dw = -itxv(t) 

■—oo 

from Theorem 6.1.1 (item 3) and by Parseval identity |kt/./(r)||j^ 2 (^) = 
||I?tJ)||j^ 2 (yj) which establishes the necessary statement. 

□ 

'*This feature is desired in all signal processing situations. 

^The restriction of x G D (*1?) and consequently of (uniform) continuity of x on 9^^ is not 
intentionally imposed. However, the physical considerations of realization in patterns 
necessitates that x G (9?) 
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A function w G witli the additional property that Tw{r) G 

(%) is known in the literature of Window transforms (op cit) as a win- 
dow function, a term used to identify a class of localized functions with 
vanishing asymptotes. Noting that linear spans of (suitably chosen) 
window functions are dense in 1/^(9^), this notion allows the preceding 
discussion to be summed up concisely as the following without the need 
for a proof. 

Cr:K£o:R£M 6.1.2 It is necessary that the weighting function (w) of an 
isolated neuron, defined to incorporate decisions on function spaces, be 
in a linear span of window functions in order that non-trivial (decision) 
functions are represented. 

As -u; is a linear combination of localized functions with vanishing 
asymptotes, the innerproduct {w,x), in some sense, allows one to con- 
sider a weighted average of restricted evaluations (discrimination) of 
the function x, the restriction is to some domain smaller than that 
over which x is defined. Owing to the continuity of w, the innerprod- 
uct {w,x) restricts attention to a connected subset of the domain of x. 
Precise characterization of localization of incident patterns x and con- 
nectedness of the local region of x will be considered in a later section. 

The above theorem is applicable to dynamical, as well as, static ver- 
sions of neurons defined to operate on function spaces. It is easy to 
see that in the case of d 3 mamical neurons, localization of evaluation is 
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really spatial as the evaluation of and, consequently, that of ?/, at 
every point f C , is restricted to a (connc'cted) region smaller than 
that over which x is defined. The region of localization is invariant in t 
(commonly having the connotations of time) as the weighting functions 
are defined to be indexed only by r (spatial connotations) and not in t. 
Hence, temporal localization cannot be assured. All of the above ob- 
servations can be easily transported to the realm of neurons defined on 
finite dimensional pattern spaces as admissibility of the weight vector 
w in is assured by a weight w in the linear span of (suitably chosen) 
discrete window sequences. 


6.2 Representation of Localization in Neural Signal 
Processors 

Since processing in isolated neurons defined on function spaces is local- 
ized, it is natural to seek the nature of representation in neural signal 
processors described by the functional form 

(6.3a) 

= cr(fo(vj(o(^’^))> (6.3b) 

(6.3c) 

where, x & X, y e t] takes values in ti*“'(.T,t) = x Wx e X, 
Vi e 3to.+, and i = 1,2,. ..,k, 7 ^^) e i = 0,1,2,..., A;, 

E(^), and for appropriate C, are allowed to be either continuous, or 
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discrete, depending on the requirements of processing and/or analysis. 
With an abuse of notation the following (simplified) convention is used. 






if r^-i) 

IS a continuous space, 

if ig discrete 


The other two inner product species are also similarly denoted. Propo- 
sition 6.1.1 (p. 317) immediately implies the following. 


6.2.1 A necessary condition for the functions G 

i = 1,2,. . ,k, to be non-trivial is that the measurement weight^ 
ing functions, and in type-k neural signal processors, 
for all t G 5Ro,+j» cl linear combination of window functions, for 

k = 1,2,..., G i ^ 1,2, . . k. In addition, it is necessary that 
the aggregation weighting functions, be in a linear combination of 
window functions to ensure that the responses, (with a quantifica- 
tion as indicated earlier), of the neural signal processor be bounded. 


Noting the way kernels have been defined in § 5.3, the preceding 
statement implies that kernels of the measurement and aggregation 
integral transforms are described by the following functional forms. 




t€I«, 

y7(^-i)€rf^-i), 


(6.4a) 
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11 

i€I< 




(6 4b) 






(6.4c) 


In these expressions, I, with an appropriate suffix, denotes the spe- 
cific index values participating in the linear combinations, to is used to 
denote a window function (as indicated in § 2.1), and the dependence of 
spatial and spectral localization on node indices and 7) is abstracted 
by the functions | and 6, respectively. Relationships between struc- 
tural aspects of the kernel and the representational character of neural 
signal processors will be focused in § 6.4 (p. 344). 

According to the above Proposition, , the measurement pertain- 
ing to the node indexed by e in layer f, is a weighted average 
of localized evaluations of the pattern (of activity) indicated by the 
concepts (or responses) of the preceding layer (subject to the un- 
derstanding that = x) and of the same layer, but the previous 

time step. Responses of the type-A: neural signal processor at 
node 7^^) G in layer I, is also indicated to be a localized evaluation 
of the decisions (inferences) of the corresponding fc-layered neural 

network which is the substrate of the type-fc neural signal processor. 

The outputs are the result of the activation functions act- 
ing (point-wise) on and it is of interest to investigate the role of 
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this nonlinear (discriminatory) processing stage in the representation 
of localization in neural signal processors. Note that the activation 
functions G ^ = 1, 2, . . , A:, of a processor in are, in 

general, functions with alternate stretches of monotonicity and smooth- 
ness (at least) at the (possibly vanishing) asymptotes, typical examples 
being the sigmoidal functions and radial basis (Gaussian) functions. 

Gaussian functions, as described in § 2.2, are, by definition, window 
functions, and hence discrimination effected by such functions are of a 
localized nature: this feature, together with norm based discriminants 
has been considered with sufficient interest in radial basis function net- 
works in view of the advantages, especially in parameter specification 
(ie, learning), offered by concepts that essentially reflect localized (pos- 
sibly compact) regions in the input pattern space. It is quite disheart- 
ening to note that sigmoidal activation functions are not in (3i), and, 
consequently, these functions are not window functions, and the lack of 
assurance in localization of evaluation has prompted radial basis func- 
tion networks to be considered superior to those employing sigmoidal 
discrimination of linear measurements. 

However, monotonic activation functions, eg, sigmoidal functions, 
induce localization in the synthesized processor in a sense described in 
the following. Though considerations of denseness in representation 
restrict the activation functions a to be different from algebraic poly- 
nomials [aej icf, Leshno, Ya Lin, et al, 1994), it should be noted that 
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the ensuing discussion on localization applies, equally, to all kinds of 
monotonicity in cr. 

6.2.2 j = 1,2, the derivative of a continuous 

monotonic function a, if it exists, with smoothness at the asymptotes is 
a window function. 

This statement will be established through the principle of 
mathematical induction on the order of differentiation.® 

Verification 

Monotonicity implies that the first derivative, if it exists, is one-sided. 
The additional requirement of smoothness at the asymptotes, ie, all 
derivatives of a that exist should vanish at ±oo, together with continu- 
ity immediately establishes that Da is a window function. 

Inference 

Consider that D^a, for some j, j = 1,2, . is a window function. In 
view of the hypothesis that a is smooth at the asymptotes it immedi- 
ately follows that D^a, the Fourier transform of D^a, is also a window 
function. As V, the derivative of D^a, is described in the spectral 
domain as D^’^^a(uj) = — , it immediately follows that D^’^^a is 

in This implies, by Parseval identity, that D^^'a is in 

Smoothness of a at the asymptotes implies that V, the derivative 
of D^a, is a window function. 

denotes the operator of differentiation with respect to the independent variable. 
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Conclusion 

The validity of the claim for j = 1 and the assurance of validity for j *f 1 
conditional on the validity for every values of j = 1, 2, . . . , establishes 
the stated claim. 

□ 


Figure 6.1 illustrates the nature of localization in the first three 
derivatives of a sigmoidal activation function. Localization of this na- 
ture is not restricted to monotonic functions alone and is applicable to 
activation functions that are continuous and piece-wise monotonic J As 
the linear span of sigmoidal functions (in fact, continuous monotonic 
functions that are different from algebraic polynomials) is dense in the 
space of continuous functions (as demonstrated by Cybenko, 1989, 
and in § 5.1) and the differentiation operator is linear, the following 
statement is obvious. 

!P:ROiPOsrjJrJO?sf 6.2.3 D^a, j = 1,2, . . ., the derivative of a continuous 
piece-wise monotonic function a, if it exists, with smoothness at the 
asymptotes, is a window function. 

The above statements assure that localization in discrimination, 
with piece-wise monotonic functions which are smooth at the asymp- 
totes, is available, if not in a, at least in the variations of a. Further, 

^The derivatives of the sigmoidal activation function are good examples of piece-wise 
monotonic functions. 
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Figure 6.1: Derivatives of a sigmoidal activation function, (a) cr (0 = 
ta7ih{^), {h)iD(j){0 = l-cr^iO, (c)(PV)(0 = 2o-’’(0- 
2aH0> (d) {D^(r){0 = - 2, ^ G % 


localization effected by piece-wise monotonic functions is, in general, 
a weighted average of local functions in the sense that the derivatives 
of such functions belong to linear spans of window functions obtained 
as derivatives of the sigmoidal activation function. It would not be 
inappropriate to interpret discrimination as an estimation of the local- 
ization in evaluation due to the specific realization of the activation 
function cr from sigmoidal functions. 

While the influence of a on the measurements rj is to cause the 
neural action (decision) y to depend only on a local range of rj, it is 
imperative that the effect of this localized evaluation be known over 
the input pattern space X (a function space). Noting that the input x 
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is, in this mildly generalized discussion, a function, attention will be 
restricted to a linear subspace® 


= {x\x = aXa, a G R} , Xa € ||a;a 112^2(5^) = 1- 

Since is the collection of range scaled versions of Xa , generality is not 
lost in assuming ||xa|| 2 ^ 2 (^) = 1* To aid the study of localization, I will 
consider the notion of directional derivatives (Corwin & Szczarba, 
1982): this notion is reproduced below. 


If / denotes a function defined over every element of then the di- 

rectional derivative^ operator (also termed as directional differentiation 
operator) Da,^, subject to the requirements of ordinary differentiation 
on /, is given by 


(D^Jix) = lim 
0 — 


f((a + 6)xa) -f(axa) 
a 


X = aXa^ 


This operator allows a consideration of the evolution of / relative to x in 
the direction Xa, the evolution being evaluated at the point x. When the 
dimensionality of is unity, as in the case of the range space of inner 
products (discrete as well as continuous), then the dirt^ctional derivative 
operator reduces to the familiar ordinary differentiation operator. The 
directional differentiation operator has characteristics similar to the 
ordinary differentiation operator, in particular, the following. 

®The symbol a is being reused in this chapter to mean the scale factor associated with 
X The earlier connotations of a being the norm of a (preservance) weight and index of 
leaves in a foliation are being discontinued. 

®The notion of directional derivative is to be considered in the context of functions 
defined over finite-dimensional as well as infinite-dimensional spaces 
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1. Additivity: (I>x„(/ + p))(a:) = (jDxa/)(‘'^) + V/,r; G X, 

2. Homogeneity: {Dx^f3f){x) = /3(JDa.^/)(.T), for all / G 3E, /3 G Ji and 
invariant with respect to x. 

3. Product Rule: g)){x) = {{D^J)(j){x) -f {f{D^^g))[x), for 

all f,g eX, 

4. Chain Rule-}^ {DxJ{9)){^) = {{Df){Da,^g){x) for all f,g G X, 

for all X G The ;th order directional derivative operator in the 
direction Xa will be denoted by j = 1,2, , the superscript denot- 

ing the repetitive application of the first-order directional derivative 
operator in the direction .r^, ie, . 


6.2.4 The jth order directional derivative, j = 1, 2, . , 
in the direction Xa of the response ^ of a type-l neural 

signal processor in ^ 01(f), for all t G restricted to C 3i, is given, 
as a function of a ^ by 

(6.5) 


^®The operator D f denotes ordinary differentiation with respect to the function / 



Section 6.2 Representation of Localization in Neural Signal Processors 


331 


yotooT: From Equation 6.3 ip. 322) and the properties of the directional 
derivative operator, it is simple to see the following. 




1 0 otherwise. 


( 6 . 6 ) 

(6.7) 


The required result is obtained by applying chain rule of directional 
derivative operators while evaluating (D^^ 0) (a), the directional 

derivatives of the neural decisions in the first layer. 

□ 


Proposition 6.2.2 (p. 326) and Proposition 6.2.3 (p. 327) establish 
that the derivative Vcr of an activation function <7 is a local function 
of its argument when the activation function is chosen to be piece-wise 
monotonic, ie, the class of functions satisfying the axiom of discrimi- 
nation. For the same class of activation functions, it follows that the 
directional derivatives of the response of a type-1 neural signal proces- 
sor is a weighted average of window functions: the weighting values 
are provided by the kernel of aggregation. Proposition 5.2.3 (p. 252) 
indicates that the functionality of a type-^ neural signal processor, ex- 
pressed as an indexed collection of operators on 30 -the index space is 
5Ro,+ - is composed of k operators, each representing the functionality of 
an appropriate type-1 neural signal processor. These observations lead 
to the following statement. 
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6 - 2.1 The jth order directional derivatives, j = 1,2, . , of 
the response of a type-k neural signal processor, A* = 1, 2, , belong to 

the linear span of window functions. 

!Pjioo:r: When k = 1, the above statement is equivalent to Proposi- 
tion 6.2.4. For other values of k the statement follows from an applica- 
tion of the chain rule for directional derivative operators on a A:-stage 
decomposition equivalent to the type-fc neural signal processor (cf 
Proposition 5.2.3 (p. 252).) 

□ 


6.3 Characterization of Localization 

Localization of processor functionality, as studied in the preceding sec- 
tions, is essentially a spatial characterization. From a signal processing 
point of view, the localization need not be restricted to spatial charac- 
terizations and can also include spectral characterizations. Thus it is 
not adequate only to investigate spatial localizations. Since the spa- 
tial characterization shows that the influence of kernels of the integral 
transforms related to measurement and aggregation in neural signal 
processors is one of restricting the weighting functions (weights) to be 
in the linear span of window functions (sequences) and the directional 
derivatives of neural signal processor response are similarly localized 
functions, an analysis based on principle of uncertainty, typical in dis- 
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cussions involving window transforms/^ would provide the necessary 
links between spatial and spectral localizations. 

In this section, I will begin with a characterization of the localization 
of an isolated neuron preparatory to a characterization of the influence 
of kernels on the representation provided by type-1 neural signal pro- 
cessors. Following this I will investigate the nature of spatial-spectral 
localization induced in neural signal processor response by the activa- 
tion functions. The weighting function ir G 2H in an isolated neuron 
having been established, in § 6.1 (p. 316), as a spatially localized func- 
tion with vanishing asymptotes, a spectral characterization is provided 
through the following. 

6,3.1 If the function w is such that w E (dt) fl (dt), 
and the first j derivatives of w exist then 

1. if the first j derivatives ofw E V- (5R) n L^(SR) then t^w{t) E 

2. if the first j derivatives of w are zero at the origin fie, cx; = 0, a; 
denoting the spectral indexing variable) then dTr^w{r) = 0, 
p = 1, 2, . . J, ie, the first j moments vanish. 

This Proposition is stronger than Proposition 6.1.3 (p. 320), and 
suggests the correlation between the (Fourier) spectral nature of the 


^^See Chapter 2 for the essential aspects of window transforms. 
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weighting function w and the potential of w being a window function 
with vanishing moments. From signal processing considerations xo € 
L^(5R) n is not an unrealistic assumption and, hence, the above 
property of weighting functions w eW will be assumed 

Window functions with at least the zeroth moment (essentially the 
average value) vanishing, ie, dr%o{r) =-■ 0, are considered as basic 
wavelet windows in the theory of wavelets, and considerations oi com- 
putational conveniences and stability require as many, as possible, of 
the initial moments to vanish However, the requirement of vanishing 
moments would severely restrict the choice of weighting functions, and 
would disallow many valid candidate functions: the Gaussian function, 
a typical example of window function, with non-zero mean value has 
none of its moments vanishing. 

Yet, a consideration of weighting functions in terms of wavelets is 
attractive enough not to be missed, and as an inevitable compromise 
between the conflicting requirements, the weighting functions will be 
considered as a mixture of wavelets, typically affine combinations fa- 
miliar in the wavelet representation of functions However, such affine 
mixtures, by virtue of linearity, will still have its initial moments van- 
ishing as indicated in the following: this is true only when the compo- 
nent wavelets in the mixture are not subjected to a null scale value. (A 
scaling by a null value, ie 0, is operationally the equivalent of introduc- 
ing a constant.) 
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6»3.2 Define the pth moment of a function f, /* 3^ 3?^ 

as My(r) = dTr^f{r). Given two basic wavelets hi and b2 such 
that their first j moments vanish, ie, (r) = 0, p = 1, 2, . . j, 2 = 1, 2. 
Let iv{t) = aihi{l3iT - Oi) + a2h2{02r - 62), Pi, 02 ai,ai,6i,92 € 3?. 


Then the moments of w vanish for p = 0, 1, . , . as 


Mli 


= I £ C) (-".r'A/;, (H + 1 £ (;) i-hf 


(t") ■ 


Association of the weighting function w with a window function 
enables a characterization of the nature of spatial localization induced 
in the response of a neural signal processor. The following statements 
are based on arguments in Chui (1992). 


Tuioj^osoo'ooi^ 6,3.3 A weighting function w derived from a basic 
wavelet window b by scaling and/or translation 


w{t) = b(/3r - 0, Vr C 3?, /? ^ 0, C ^ (6*8) 


where 0 is the scale factor and ^ is the translation, localizes evaluation 
of the innerproduct {w, x) to the index window (ie spatial localization) 



0 ’ 



zAb 

0 


of the input pattern (function) x. The index window is centered at = 
^ ^ and has a width = 2^, where, and A^ are, respectively, 

the center and width of the basic wavelet b. 
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6*3.4 The index window of a weighting function tv 
given by Equation 6,8 is a connected subset of^ if the basic wavelet b is 
a continuous function, 

6.3.1 Weighting functions k; G 2IJ C L^(%) n having 

a wavelet representation 

w{t) = Y^a^b^{0^T - Q 
tei 

where {bj}j^j is a collection of basic wavelets, and a* = {vj, bt), i 6 I, 
exhibit the following. 

1, The innerproduct (w^x) of the input pattern (function) x with the 
w evaluates x on a localized region of its definition. 

2. An isolated neuron equipped with this weighting function w and 

discrimination enforced by the activation function a evaluates a 
predicate that is a weighted average of localizations on the input 
pattern (function) x; this predicate is essentially an assessment of 
relative organization of assignments in x and averaging of con- 
stituent localized weightings of x, decided by the coefficients of 
representation ofw in the collection of wavelets is over 

different scales and shifts. 

Piece-wise monotonicity in activation functions, a feature typical of 
the prevailing tradition in neural signal processing, has been shown in 
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§ 6.1 to incorporate localization in the response of an isolated neuron. 
Localization due to monotonicity in activation functions that are also 
continuous, and, preferably, analytic is in the sense of the directional 
derivatives of the neural response, expressed as a function of traversal 
in the direction of differentiation, being window functions. 

Isolated neurons whose activation functions are represented as a lin- 
ear combination of sigmoidal functions, spanning^^ the class of neurons 
with piece-wise monotonic activation functions, are similar in structure 
and function to neural signal processors, and consequently, the charac- 
terization of the nature of localization in the predicates of isolated neu- 
rons is provided later on in this section through a characterization of 
the localization in the predicates of neural signal processors. Presently 
I will focus on the nature of processors represented in isolated neurons 
due to sigmoidal activation functions and will hint at the similarity in 
representation with sigmoidal and Gaussian activation functions. 


6.3.5 For a sigmoid function 




(1 + tanh{^)) 
2 


-tc- 


(a e the derivatives D^a, j = 2,3 , . . are 


1, basic wavelet windows, ie, = 0, 

^^Recall the density theorem, originally, due to Cybenko (1989), and rephrased in 
§ 5.1, This theorem assures that any continuous function of interest can be represented, 
with arbitrary accuracy, through finite linear combinations of sigmoidal functions 
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2. admissible wavelets, ie, = 21^ duj\uj\-^\DJa{u)f < OQ. 

The first part is obvious from the definition of the sigmoid 
function. Figure 6.1 (p. 328) provides ample justification for the state- 
ment. I will prove only the admissibility of the derivatives of sigmoids 
as wavelets. 

Denote fj {^) = ^D^a{uj). This simplifies the expression for Cd,^ to 

+ 00 

— cx) 

By Cauchy-Schwartz inequality, 

< ll/j ||JD-?(j(u;)||^ 2(^) = ll/j (OIIl2(3R) • 

The right side in the above expression is simply ParsevaFs identity. 
Thus Cjr)j^ < 00 if / e and D^a € L^(5R). As fj{(jj) = ^D^a{uj), 
it is simple to observe from the theory of Fourier transforms that 
fj iO = / drD^air), ie, fj is the integral of D^a, From the definition 

— CX) 

of the sigmoid function, the desired result is established for j = 2,3,..., 
noting that for these values of j , the local nature of the derivatives of a 
(established in Proposition 6.2.2 (p. 326) and Proposition 6.2.3 ip. 327)) 
ensures the existence of /, and, thereby, Cd,„. 

□ 


T3ioyos30'3O?v 6.3.6 A Gaussian function 
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easily seen to be an analytic function, has its derivatives D^a, j = 
1,2, . , as 

1. basic wavelet windows, ie, a = 0, 

2. admissible wavelets, ie, dw < oo. 

The above result is not surprising on observing that the derivative of 
a sigmoidal function, = |(C+ ^ ^ window function, 

is a very good approximation to the Gaussian function Og, except for 
range translation. As a consequence of the above Propositions, the 
following is evident. 

6.3.7 The directional derivatives of the response of an 
isolated neuron, in any direction, expressed as a function of traversal 
in the direction of differentiation, is a basic wavelet window, and is an 
admissible wavelet. 

Concepts represented by neural signal processors are local in char- 
acter, and the nature of localization is strongly dependent on choices 
made regarding weighting functions (ie, measurement and aggregation 
kernels) and activation functions. Recall the operational nature of a 
simplified version of an isolated neuron, 

y{x) = ct{{w,x) -0), 

^^This reduces to Equation 5.7 (p 276) if x and, consequently, w are drawn from a finite- 
dimensional Hilbert space, a situation reflected in the notations x and w, respectively, 
with the understanding that {tw , x) = w^x. 
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where 0 E 5R is the threshold and 

N 

1=1 

where m is a window function whose spatial and spectral characteristics 
are determined by ^ and 6, N is a priori chosen, and S is the common 
space^^ of definition of x and w. It is not difficult to visualize that the 
neural action (decision) is a statement comparing a weighted average of 
localized assessments of x with the threshold 6, the sense of averaging 
and localization being decided by the specific window functions in use. 

With the binary comparator a =: an (see § 2.2), it would not be 
erroneous to declare that an isolated neuron evaluates a predicate on 
the input pattern x (x), the nature of the predicate is decided by the 
component window functions of w (w). Though this aspect has been 
pointed out by McCulloch & Pitts (1943) (as propositions) and by 
Minsky & Papert (1969) and has provided the basis for decision 
making with neural networks, it is important to recognize the local 
character of the predicate: localization influenced by the nature of the 
weighting function (or vector) w (w) is not unrelated to the notion of 
diameter limitedness considered by Minsky. 

Isolated neurons represent unquantified predicates of the first or- 
der logic. Continuous versions of the activation function a initiate an 

^^Note that if X (and w) are finite dimensional, the set H is isomorphic to a set containing 
the first n naturals, n being the number of (distinct) basis vectors necessary in describing 
X and w. 
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interpretation of neural response as predicates of first order fuzzy logic. 
Neurons with a response derived as monotonic activation functions op- 
erating on discriminants that are linear, due to innerproduct operation, 
are not capable of representing all possible predicates (of first order 
logic), and this limitation is overcome by networks of neurons. As these 
localization predicates are dependent on the relative organization of as- 
signments in the input pattern x (x), I will term the predicates arising 
from the influence of weighting functions as intra-pattern predicates. 

Decisions (y) in neural signal processors are intra-pattern predi- 
cates when attention is restricted to the influence of measurement and 
aggregation kernels on processor functionality. Noting that decisions 
of the final layer in a layered neural signal processor are dependent on 
those in the previous layers, the following characterization is helpful in 
understanding the representation of perceptually relevant operations. 

6.3.2 in a type-k neural signal processor, 

fc = 1,2, . , is an unquantified intra-pattern predicate of higher order 
logic (fuzzy, if activation functions, cr, are continuous) when the iso- 
lated influence of the kernels of measurement and aggregation integral 
transforms is considered. 


As unquantified predicates are no different from relations, the above 
statement states that neural networks are capable of representing re- 
lations, the arity increasing as the number of layers. 
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Since localization in evaluation is affected by the choice of activation 
functions, it is of interest to study the implications of such a localization. 
Noting that measurement kernels induce a foliation (possibly of non- 
linear manifolds, as the depth of association -le, layering -increases), 
and the role of discrimination is to effect a reordering of the components 
of the foliation, neural response incorporates the result of a compari- 
son of relative organization of assignments between members in the 
input pattern space. However, the activation function, in general, is 
not a window function, nor can always be expressed as a finite linear 
combination of window functions (eg, representation of the sigmoidal 
function in the linear span of window functions cannot be assured as 
the sigmoid is not in L^), and, hence, inter-pattern predicates cannot be 
assured in the neural decision. 

But, as established in § 6.2 (p. 322), directional derivatives (of all 
orders, assuming existence) of neural response, restricted to a linear 
subspace of inputs, is a superposition of window functions; for the 
directional derivatives to have any meaning, it is essential that the 
activation functions are differentiable, preferably analytic. In order 
to distinguish the localization due to activation functions from that 
due to measurement and aggregation kernels, I will term the direc- 
tional derivatives of the neural response as {directional) inter-pattern 
predicates. While it is of interest to know the nature of inter-pattern 
predicates, in the sense of the system of logic incorporated, this problem 
has not been addressed in this thesis. 
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As neural signal processor responses (concepts) are derived as linear 
combinations of neural network decisions, it is important to note that 
c()nc('[)tH in inniral signal processors refer to a superposition of labels (or 
signal assignments), the superposition being dictated by the predicates 
realized in the decision units. (This motivates the term ’aggregation 
kernel’ for K^,) It is immediately evident that concepts realized in 
neural signal processors can ho taxonomical or coinploxivo (sof» 2.3) 
depending on the degree of overlap between the different predicates 
that constitute the concept. Since the concepts are derived from deci- 
sions, it is natural to expect the responses of neural signal processors to 
reflect localized evaluation of inter-pattern and intra-pattern relative 
organization of assignments. 

In general, the design of a neural signal processor involves selec- 
tion of activation functions as well as measurement and aggregation 
kernels, and, hence, a concomitant appreciation of intra-pattern and 
inter-pattern correlations is imperative. Noting that the space of input 
patterns is, in the language of category theory, a sheap^ (cf, Tenni- 
son, 1975), it is not difficult to visualize that concepts and decisions in 
the processing units of neural signal processors restrict evaluation to 
localized regions in the sheaf of input patterns. These regions, in gen- 
eral, have multiple connected components, the connectedness of each 
component arising from continuity in the activation functions. 

the notion of a sheaf is rather involved, I will not attempt to provide a concise 
introduction of the same. 
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6.4 Kernel Influence on Representation 

In an earlier section, I have shown that it is necessary for the weighting 
functions (weight vectors in the case of neurons on a finite-dimensional 
input space) to belong to a collection that is a linear combination of local 
functions (sequences), the requirement stemming from considerations 
of realization. As a consequence, the kernels of the integral trans- 
forms of measurement and aggregation are restricted to be semi-local 
functions. In a type-A; neural signal processor, ie, a member of *^(n(i), 
Vi € Ko,h-, the kernel is local in 

is local in 7 ^^) and is local in where G 

t=l,2,.. A:,7W Gr(^),f = 0,1, 2 ,. .A:,ifc = l,2,. .. 

As a specific instance of kernels with a (semi) local character, I 
will consider the case when the kernels of the integral transforms of 
measurement and aggregation are chosen (or restricted) to be in the 
class of reproducing kernels.^® Kernels it (^, 7 ), wherein the indices { 
and 7 belong to appropriate sets 3 and F, respectively, with the property 
(Aronszajn, 1950) 

/(0 = (it(^,7),/(7)}, (6.9) 

the equality is to be understood in the sense. Note that this prop- 
erty requires 3 to be the same as F for consistency. The collection of 
functions that satisfy the property indicated in Equation 6.9 (when com- 
pleted to form a Hilbert space) is termed a reproducing kernel Hilbert 


It is simple to see that all reproducing kernels are local in 7 from Equation 6.9 

Locality of K is necessary to ensure existence of the inner product 



Section 6 4 Kernel Influence on Representation 


345 


space (RKHS) with a reproducing kernel K, In order that the RKHs is 
valid, K 7 ), expressed as functions of 7 for every value of ^ G H = jT 
should satisfy the reproducing property in Equation 6.9. 

It is necessary that a bivariate function K 7 ) exhibit the follow- 
ing properties (op cit) to be a reproducing kernel of a Hilbert space 
consisting of real valued functions. 

1 . Symmetry.^’’ = K{'y,0- 

2. Non-negativity. > 0 and |J<’(^, 7 )|^ < 

A kernel satisf 3 dng the above properties has been shown {op cit) to be 
uniquely associated with a RKHs. If F denotes a Hilbert space of finite 
dimension, say n, containing real valued functions and <j>i, 4 > 2 , ■ . - (pN 
are N linearly independent functions of F then the reproducing kernel 
of the RKHs F is given by 

N N 

= (6.10) 

1=1 J = 1 

whore is the inverse of the Gramm matrix [(0x, for the 

system of functions 

Consider a type-1 feed-forward neural signal processor which has 
the same activation function operating on all the measurements: 

reproducing ke rnel for a Hilbert space consisting complex valued functions is Her- 
mitian, le, K(^,j) = 
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where x: -y 3? denotes the input concept (pattern) from which the 

(desired) newer concepts t) (whose domain is the same as that of the 
function x) are realized. The measurement kernel ^ is chosen to be 
a reproducing kernel for the collection of functions X - the input concept 
(or pattern) is drawn from this space - with the accompanying assump- 
tion (of consistency) that T*®) = For convenience in analysis, I 

assume jC to be a RKHs with as the reproducing kernel. 

The reproducing property of on the functions in J simplifies 
Equation 6.11 to the following expression: 

As a result, the collection of thresholds in the first layer, le, 6^^^ 

^(1) g is the equivalent of a template of the concept under test in 

the incident pattern. The comparison effected through the activation 
function a is in the sense of the deviation of values in the presented 
concept (pattern) x from those indicated in the template 


Recalling the discussion in Chapters 3 and 4, the notion of reproduc- 


ing kernels is analogous to that of preservance applied to the collection 


This 


of weights indicated by the matrix^® 
matrix is the same as the measurement kernel 7*°^) with the 


• SUrni 


interpretation that the index of processing nodes, takes on the 

*®In contrast, the measurement kernel , in all layers, of the neural signal processors 
considered in Chapters 4, 5 and the earlier sections of this chapter, have the interpretation 
of providing a template of the patterns under test. 

^®The matrix has been introduced in § 5.2. 
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discrete values in { 1 , 2 ,... mi } and 7 ^®!, the channel index in any pro- 
cessing^ node, is assip^nod values from the set |l, 2 , ™ nio //|. 

Consider the class of measurement kernels as constructed belo’w. 

f = l, 2 , ..n, ( 6 . 12 a) 

e (5(^(')-ii)<5(7(‘’^-ii){±2^ Ii = l,2, .. 72 }, 

71,72 = 1,2,. n, (6.12b) 

such that the assignments to Kw^ satisfy the properties of symme- 
try and non-negativity indicated earlier and for every value of 
= 1,2, ..n, lifw is not the same for all values of 7 (®^, 
7 ^°^ = 1, 2, . . . 71 , (See Theorem 3.1.2 (p. 115) for the necessity of the 
latter restriction on the assignments to the measurement kernel.) The 
following is then easily observed. 

TrKeo:x£M 6.4.1 A type-l neural signal processor with a measurement 
kernel constructed as in Expression 6.12, given n, n. = 1,2,. , is equiv' 

alent to a processor wherein the weights in the distinct nodes are the 
preservance weights of the discrete input space r = 1,2,. 

( E 5R+ and ^ € 5?'', such that for all i, i = 2,3, ...mi = /no ~ 

\\d'H\ = \\d^\\. 


In § 4.1 I have discussed the nature of representation in type-1 
neural signal processors wherein the weights of the distinct processors 
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belongto the class of prescrvancc weights of the discrete s|)nce P" (C, i/). 
Note that a kernel defined as 

= 6(^-1), (6 13) 

where S indicates the dirac delta function, trivially satisfies the re- 
producing property indicated in Equation 6.9 {p. 344) for all functions 
in the collection of square integrable functions defined on 5ft 

In addition, Aronszajn (1950) has indicated that if is the 

reproducing kernel of the RKHs iq then is the 

7 

reproducing kernel of the class of functions F given by 


These observations suggest that the kernel of measurement integral 
transform constructed as in Expression 6.12 is a reproducing kernel, 
thereby pointing to the similarity in the notions of preservance in the 
context of discrete spaces and that of reproducing kernels in a more 
general context. As an example consider the discrete kernel 



1 -2 4 

-2 4 1 

4 1 2 


This is a reproducing kernel and, as indicated in Table 3.1 (p. 123), each 
rowis apreservanceweightof7^r(C52)>^" = li2, . ; C G 5^+ and ^ G 5ft^. 


Similarity between the notions of preservance and reproducing ker- 
nels need not be restricted to the preservance weights of ^"(C,22)- As 
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discussed in § 3.4, the notion of preservance is applicable to non-null 
weights in (the discrete space preserved has been termed as preser- 
vance input space). This admits a larger class of reproducing kernels, 
than that suggested in Expression 6.12, to be compared with a collection 
of preservance weights. 

However, the notions of preservance in discrete spaces is not the 
same as the property of reproduction in the measurement kernels. The 
differences are due to the fact that while in preservance it is impor- 
tant to ensure order preservation, no such restriction is made in the 
case of the reproducing property of kernels. (It is simple to see that 
uniqueness preservation is assured in the reproduction property and 
the condition of regularity in preservance is introduced only for analyti- 
cal convenience.) In addition, note that not all collections of preservance 
weights (even in the restricted case when mo = n, the number of input 
channels, is the same as 77?,i, the number of processing nodes in the 
type-1 neural signal processor) are reproducing kernels. 

Nashed & Walter (1991) remark that to every reproducing kernel a 
sampling theorem is associated, te, given that the kernel ) 

is a reproducing kernel for the Hilbert space X there exist (sampling) 
functions for denumerable integer values of z, such that 

(note that as Kw^ is a reproducing kernel for 3£) 

for all I €36. (6.14) 

t 

In the above equation, the equality is in the sense and all concepts 
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(considered analogous to signals) in 3C are assumed to be band-limited 
with a spectral band [— J7, ^2] C 92 and {;r,} refers to a denumerable 
collection of uniformly spaced^® samples of the incident concept x, the 
sampling period is decided by the Nyquist rate appropriate to a band- 
width J?. The convolutional kernel described by Equation 6.13 (p. 348) 
is the simplest example of a sampling function 

The sampling functions are related to the reproducing kernel 
in the following way (op cit). 

= (6 15) 

J 

where i and j take on denumerable integer values and is the 

inverse of the kernel (matrix) 7 ^^) ) (restricted such that G 

{i} 7 ^°^ € {j}) as an operator on (This inverse is bounded as 
is non-singular.) While it desired that the sampling functions form an 
orthonormal basis, the minimal structure required in the collection of 
sampling functions is biorthogonality: 

= 6(i - j) , (6.16) 

where i and j take on denumerable integer values. The orthogonal basis 
of the RKHs corresponding to the reproducing kernel Kw^ is given by 

= ( 6 . 17 ) 

3 

^°This discussion need not be restricted to the case of uniformly sampled sequences, as 
the reconstruction of functions with a sequence of non-uniformly sampled values of the 
function is assured, under appropriate conditions, by the Paley-Wiener sampling theorem 
(Benedetto (1992)) 
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On an incorporation of the representation of the incident concepts 
(patterns) x as suggested by the sampling theorem indicated in Equa- 
tion 6 14, the evaluation of a type-1 neural signal processor modeled 
in Equation 6,11 (jo. 345) reduces to the following expression (note the 
reproducing nature of the kernel Kw^): 

^%{x) = (6.18) 

t 

where {xj } denotes the sequence of samples of x. In the above Equation, 
the number of samples of the incident concept x is denumerable. 

However, if in addition to band-limitedness, the concept x exhibits 
a locality in the domain of definition, le, x expressed as a function 

of 7 ^°) € is in the linear span of a finite number of suitably cho- 
sen window functions, then a representation of x through a sequence 
of samples, as in Equation 6.14, will need no more than a finite num- 
ber of (uniformly^^ spaced) samples: representation is in the sense of 
minimizing the (9i) [/z] norm (with a measure /x) of the error in approx- 
imating X with a superposition of a finite number of samples weighing 
the sampling functions. (See Daubechies, 1992, for the reasoning in- 
volved in this statement.) An implication of the conjoint spatial-spectral 
locality in the incident patterns, x e Xy follows. 

^^The adequacy of a finite number of samples for a (satisfactory) representation of 
signals (functions) that have a compact support in the (Fourier) spectrum and exhibit 
locality in the domain of definition {eg, time) is not limited to the case of uniform sampling. 
Representation of non-uniformly sampled signals is commonly addressed through the 
Paley- Wiener sampling theorem (Benedetto, 1992), 
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6.4.2 A neural signal processor with a finite number of 
input elements represents an evaluation on continuous concepts when 
the kernel of the measurement integral transform is a reproducing kernel 
for the space of incident concepts 

The above statement implies that even though the inputs are re- 
stricted to a subspace .V of the finite dimensional Euclidean space K'’, 
the foliations due to measurement are induced on 36 (A" C 3c), the col- 
lection of band-limited continuous concepts exhibiting locality in the 
domain of definition, when the measurement kernel is a reproducing 
kernel for the RKHS that embeds A! Note that the template is still 
assumed to have a continuous domain of definition, indicating that the 
collection of processing nodes is indexed on a continuous set. 

Having observed the nature of representation in (type-1) neural sig- 
nal processors that restrict the measurement kernel, Kw \ to the class 
of reproducing kernels, it is natural to seek the representational char- 
acteristics of type-1 neural signal processors wherein the aggregation 
kernel is a reproducing kernel. For the aggregation kernel 
to be meaningful as a reproducing kernel, the collection of decisions 
I ^ j, where every member is expressed as a func- 
tion indexed on the (continuous) set 5^^^, has to be a non-trivial subset 
of an appropriate Hilbert space. 

^^The reproducing property of a kernel for a RKHS, say F, is inherited by every sub- 
space of F (Aroiisz£dn, 1950) 
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Since for every value of in is derived from a function 
indexed in (see Equation 6.11 (p. 345)) through the activation func- 
tion a and as the integrability^^ of a is not assumed in the discussion, 
the existence of a metric appropriate to the collection of decisions, an 
axiomatic requirement for a Hilbert space, cannot be assured. The in- 
fluence, on representation, of reproducing property in the aggregation 
kernels is, thereby, not considered. 

In view of the preceding discussion on the nature of representation 
in type-1 neural signal processors with measurement kernels of the re- 
producing type (see Theorem 6.4.2), it is easy to observe the following.^"^ 
(Note that = x, the incident concept (signal).) 

JiKsoiRSM 6.4.3 A type-k neural signal processor, k = 2,3,..., with 
discrete number of nodes in each layer represents an evaluation of con- 
tinuous, band-limited and local concepts (signals) 

I = 0, 1, . . . fc, when the measurement kernels Kw\ I = 1, 2, . . . fc, are 
of the reproducing type. 

One of the issues that is important in a discussion involving the rep- 
resentation of continuous concepts (signals) through samples is that of 
aliasing. If the finite number of samples {xj of x, in a type-1 neural 
signal processor, are obtained at a rate (assumed uniform, for conve- 

shown in Chapter 5, sigmoidal activation functions are not integrable. 

^^The representational aspect due to the reproducing property in the measurement 
kernels is uninfluenced by the reproducing nature of the aggregation kernels. 
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nience) lesser than the minimum prescribed by the Nyquist rate for the 
reproducing kernel K^\ the reconstruction, x, of x through the finite 
number of samples will, in general, be erroneous with respect to the ac- 
tual concept X. However, noting that x is compared with the template 
through the activation function <j, the error, ||x ~ x||, influences the 
decision only in a local neighbourhood of the template the extent 
of locality depending on the nature of the activation function 

In the case of sigmoidal activation functions (including hard-limiter), 
the error due to aliasing is not appreciable when the incident concept 
a: is at a considerable distance from 0. (This is due to the saturating 
nature of a sigmoidal function a.) As a consequence, considerations of 
aliasing on the sampling of x (in terms of sampling rate in the case of 
uniform sampling) depend on the spectral characteristics of the bound- 
ary of the partition induced by the activation function 0 on the RKHs 
embedding the input concept space 36. 

Note that the decision boundary, ie, the boundary of the partition, is 
given by the leaf of the foliation on X, due to the measurement kernel 
(assumed to be a reproducing kernel), which maps to the tem- 
plate But, in order to ensure that the operation of subtraction in 
Equation 6.18 ip. 351) is meaningful, the function (concept) spec- 
ifying the collection of thresholds is to be a member of X, whence the 
considerations of sampling on X depend on the 'concept template' 

§ 6 2 and § 6.3 for a discussion on the influence of cr on the local nature of 
processing in neural signal processors 
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Theorem 6.4.3 implies that considerations of sampling on the col- 
lection of concepts ^9^ depend on the ’concept template’ i = 

0, 1, . — 1, = 1, 2, . . A discussion on the nature of sampling func- 

tions, their dependence on the ’spectral’ characteristics of the ’concept 
templates’ 9 and the associated issues of signal realization are not in the 
scope of this thesis. I now consider, briefly, the aspect of representation 
when the kernel corresponding to measurements due to lateral in- 
teraction is of reproducing type in addition to the measurement kernel 
Kw \ Without any loss of generality the restricted case of type-1 neural 
signal processors is considered. 

Arguing on the lines of feed-forward neural signal processors with 
measurement kernels that are of reproducing type, a type-1 neural sig- 
nal processor wherein the measurement kernels and are both 

of reproducing type imply that the evaluation of a neural signal proces- 
sor defined on the space of (continuous, band-limited and local) concepts 
X and 2) (with the index sets and being continuous) is 

equivalent to the following functional form. 

j(i) |(0) 

p(U 

This is identical to Equation 5.1 (p. 238)^® with fc = 1 when the (discrete) 
sampling functions and for appropriate values of 

and are interpreted as the feed-through and lateral interaction 


is the discrete time-travel index 
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connection strengths associated with the channels (indexed by 
converging on the processing nodes with indices respectively.^' 

(Note that the symmetry of reproducing kernels has been invoked to 
arrive at this interpretation.) Thus the following is evident. 

Ta-€£oaieM 6.4.4 The measurement kernels"^^ ^ ^ = 

1,2,... k, associated with a type-k neural signal processor, A: = 1, 2, . 
defined with finite number of nodes represent the sampling functions 
corresponding to the measurement kernels, of reproducing type, 
and of a type-k neural signal processor that is defined to establish 

mappings between continuous, band-limited and local concepts (signals) 
through layers of continuously indexed processor arrays. 

A similar interpretation is applicable to the aggregation kernels 
when the activation function a is chosen to be integrable, eg, Gaussian 
functions oTg (see Equation 2.13 {p. 62)), so that the space of decisions 
}, in each layer f = 1, 2, . k, in a type-fc neural signal processor 
can be embedded in an appropriate Hilbert space. The above theorem 
suggests that the vector of connection strengths associated with every 
prpcessing node is a distinct discretized (sampled) sampling function. 
The collection of inputs to the node denote the samples of the continuous 
concept (signal). 

^’’This interpretation is not specific to neural signal processors with a non-null kernel 
of lateral interaction. 

^®The prescripts ’c’ and ’d’ are used to distinguish the measurement kernels correspond- 
ing to neural signal processors involving processors indexed on continuous and discrete 
sets, respectively. 
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Theorem 6.4.4 (and a similar statement regarding aggregation ker- 
nels in neural signal processors restricted to incorporate activation 
functions that are integrable) is a characterization of the nature of 
information stored (represented) in the connection strengths between 
the distinct processing nodes in an ensemble of neurons: this character- 
ization is possible in the narrow, but not unrealistic, context of layered 
processing structures. Based on this discussion, I will now investigate 
the nature of representation in neural signal processors to characterize 
the operational aspect of neural networks. 

In § 5.4, the issue of an automatic specification of weights, ie learn- 
ing, has been interpreted, equivalently, as a design of the kernels of 
the integral transforms of measurement and aggregation. Such an 
operational scheme is based on a characterization of kernels as func- 
tions of two variables and is related to the incorporation of a priori^ 
but partial, knowledge of the interconnection strengths (equivalently 
sampling functions) between the processing nodes. An approach to the 
synthesis of the kernels of the integral transforms of measurement and 
aggregation is indicated in Equation 6.10 (p. 345): the kernels are of 
the reproducing^^ t 5 q 3 e for a Hilbert space whose basis functions are 
related to the linearly independent functions that are involved 

in the realization of the kernels (see Equation 6.17 {p. 350). 


^®Note that if the aggregation kernels are chosen to be of the reproducing type, the 
activation functions should correspondingly be chosen to be of the integrable type if the 
internal representations are to be interpreted m terms of (non-uniform) samplings (and 
associated reconstruction) of the incident concepts. 
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Consider the measurement operation of a neural signal processor of 
the non-evolution ary type defined on the function space 






( 6 . 20 ) 


where G e = 9 ^ and r)^”i)(.T) = .t g X. 

Recall from Equation 6.10 that a self-reproducing measurement kernel 
on a finite dimensional space is given for some f, f = 1, 2, , by 
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with the associated interpretation of the functions and the coef- 
ficients In the above expression, the coefficients „,/? and and 

the indices i have been scripted by the layer index t to highlight, in the 
symbolization, the fact that the ’reproducing feedthrough measurement 
kernel basis’ functions are not required to be the same in all the 

layers. The prefix ’w’ is used to denote that the entities correspond to 
the kernel of the integral transform of measurement 

due to feed-through associations. In a similar way, analogous entities 
corresponding to the kernels and are 

prefixed by ’e’ and ’v,’ respectively. 


An incorporation of the decomposition in Equation 6.21 of the mea- 
surement kernel into the expression (Equation 6.20) 

^^Without any loss of generality, the ensuing discussion can, trivially, be extended to 
include an investigation on the nature of representation through the measurement kernel 
Ke and the aggregation kernel Kv 


Section 6 4. Kernel Influence on Representation 


359 


for the measurements in layer £ results in the following. 


E E - 


> 0 : 






4 % ( 7 ^^' 


(2:) ), 




( 6 . 22 ) 


for all e and 7 (^“^) G From this equation, it is clear that 

the collection of measurements (rj) in any layer is a representation of the 
concept (signal) incident on that layer. This interpretation assures that 
the notion of representation used in this thesis to mean a decomposition 
and/or synthesis of the desired functions in a ’basis’ that is itself realized 
in accordance with the processing requirements is not invalid.^^ 


In order that the above aspect is appreciated, consider the nature of 
measurement functions in an evolutionary neural signal processor with 
multiple layers. On lines similar to the derivation of Equation 6.22, the 
measurement functions in layer ^ in a neural signal processor with 
feedthrough associations as well as lateral interactions is given by the 
following, {u is the discrete time travel index.) 





+ E E 

( 6 . 23 ) 


^^The notion of representation is reaffirmed by the function representation theorem 
originating in a solution -by Kolmogorov (1957a) -to Hilbert’s 13th problem 
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for all G g and 7 ^^) € It is of interest to 

note that the collection of responses (ie, processed concept) is a rep- 
resentation of neural decisions, this representation is in a basis that 
is chosen (or sought) for the realization aggregation kernels that are 
self-reproducing. 


(x, ’^)= Y1 

for all 7^^) G and G 


The requirement of locaUty on the kernels states that measurement 
kernel be a local function in the variable 7 ^^"^) for all 

values of the variable • (Note that such a locality is required of the 
aggregation kernels also, the locality of ( 7 ^^) , ^ ) is in the variable 
^(<) g for all values of the variable 7^^) G .) In addition, the 
measurement kernel (^(^), 7 ^^“^^) is to be symmetric to be admissi- 
ble as a self-reproducing kernel. These requirements stipulate that the 
’reproducing feedthrough measurement kernel basis’ functions be 
chosen to be local functions. 


In particular, consider the ’reproducing kernel basis’ functions to be 
the directional derivatives of a sceilar valued non-evolutionary neural 
signal processor of type-1 with sigmoidal activation functions. Note 
that the class of functions realized by type -1 neural signal processors 
is dense in the space of continuous functions (cf, Theorem 5.1.1 (p. 243) 
which can be trivially extended to the case of neurons defined on func- 
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tion spaces); thus no generality is lost, in this preliminary investi- 
gation, by restricting attention to ’reproducing kernel basis’ functions 
derived from functions realized by type-1 neural signal processors. The 
analyticity of the sigmoidal activation functions, in addition, assures 
that the class of functions containing the directional derivatives, in all 
directions x £ X (||x|| = 1), of the collection of functions realized by 
type-1 neural signal processors with sigmoidal activation functions is 
dense in the space of continuous functions. Proposition 6.2.4 (p. 330) 
and Theorem 6.2.1 (p. 332) establish that the directional derivatives, 
in all directions in the input space and all (positive integer) orders, of 
type-fc. A: = 1, 2, . . , neural signal processors with activation functions 
that satisfy the axiom of discrimination are local functions. 

Let the ’reproducing feedthrough measurement kernel basis’ func- 
tions be given by the following expression. 

for all e = 1,2,. , for some vnluo.s of j, j = 

1,2,.. In the above equation, refers to a non-evolutionary type-1 
neural signal processor defined on the (function) space k : the concepts 
(signals) drawn from this space are denoted by .r and the direction in 
which the di fferentiation is considered is denoted by • The functions 
and are chosen to establish mappings between denumerable 
spaces, and are interpreted as the scale and translation values that the 
basic function (^))(^^^^) is subjected to. If Z denotes the set of 
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denumerable integers, then by Cantor’s diagonalization argument the 
set = Z X Z is equinumerous with Z. This aspect has encourages a 
singly indexed family of functions rather than dou- 

bly indexed families of functions familiar in the discussion of wavelets. 
However, the necessity for distinctness between the scale and transla- 
tion values has been retained through the functions and 

Proposition 6.3.5 (p. 337) establishes that the second and higher 
derivatives of the sigmoidal function are basic and admissible wavelet 
window functions. Recall, from Proposition 6.2.4 (p 330), the struc- 
ture of the basic function If in this function, the 

constituent neural signal processor is composed as a (finite) lin- 
ear combination of sigmoidal activation functions and j is restricted 
to take integer values no smaller than 2, then it is immediately ev- 
ident that the ’reproducing feedthrough measurement kernel basis’ 
functions w <!>[(}) are synthesized (represented) in a basis of ad- 
missible wavelets. This aspect is true of all activation functions that 
are continuous (analytic) and satisfy the axiom of discrimination, how- 
ever, depending on the specific nature of the functional dependence, the 
first derivative of the activation function is also an admissible wavelet 
(eg, in the case of Gaussian activation functions, the first and higher 
derivatives are basic and admissible wavelet window functions.) 

Note that in the representation suggested by Proposition 6.2.4 the 
scale factor and the translation of the wavelet window functions in the 
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composition are governed by the dependence of the measurement kernel 
(actually the innerproduct of weighting functions with the (functional) 
direction of differentiation) and the threshold, of the non-evolutionary 
type -1 neural signal processor 5 , on the indexing variable of the deci- 
sion units. The denumerable collection of admissible wavelets forms 
a frame when these admissible wavelets are derived, through discrete 
scaling and shifts in the domain, from a basic wavelet that is the second, 
or higher, derivative of an activation function that is continuous and 
satisfies the axiom of discrimination. While a proof of this statement 
has not been provided in this thesis, the correctness can easily be seen 
by recognizing the following. 

Every activation function that is continuous and satisfies the axiom 
of discrimination has a representation in terms of a linear combina- 
tion of shifted sigmoids as indicated in § 5.4. Thus the derivatives of 
order j, j = 2 , 3, . of all such activation functions are finite linear 
combinations of (domain) scaled and translated derivatives, of order 7 , 
of the sigmoidal activation function. Note that the first derivative of 
a sigmoidal function ^^(O = tanh{^) is scch^{^), a very good approx- 
imation to the Gaussian functions. On lines similar to that used to 
establish the frame property of a denumerable collection of (domain) 
scaled and shifted Gabor functions, a denumerable collection of func- 
tions, each given by a (domain) scaling and shift of the first derivative 
of the sigmoidal activation function, can be established as a frame. 




364 


Chapter 6 Localization in Neural Signal Processing 


Consider a denumerable collection of functions 
struct the denumerable collection of functions where, for all 

i £ Tit — Diu. It is then easy to establish that if all functions in the 
collection are local and the collection is a frame, then so is the 

collection Referring to the discussion on locality in the earlier 

sections of this chapter, it is evident that a denumerable collection of 
(domain) shifted and scaled derivatives (of second, or higher, order) of 
activation functions that are continuous and satisfy the axiom of dis- 
crimination satisfies the property of a frame. In addition, the functions 
in the collection are admissible wavelets. 

Recall the representational scheme in Equation 6.22 (p, 359). In 
view of the preceding discussion on the nature of the 'reproducing 
feedthrough measurement kernel basis' functions 
the measurement functions in any layer of a multi-layered neural sig- 
nal processor represent the concept (signal) incident on that layer in a 
basis that is drawn from a wavelet frame This characterization of the 
nature of representation in neural signal processors is unaltered on an 
incorporation of the integral transform of measurement due to lai(;ral 
interaction in the expression for the measurement functions 

In a manner analogous to the measurement functions, the aggre- 
gates (responses) of any layer in a multi-layer neural signal processor 
is a representation, in a basis drawn from a wavelet frame, of the deci- 
sions taken on the values of the measurements functions of that layer. 
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In both cases, if the collection of 'reproducing kernel basis' functions 
forms a dual frame, then the nature of representation in neural sig- 
nal processors suggests similarities with the representational nature 
in the conventional approach to signal processing. The measurement 
functions, in any layer, are a weighted average of wavelet transforms 
of the incident concept and the response, in any layer, of the neural sig- 
nal processor is a weighted average of reconstructions, through inverse 
wavelet transforms, from the decisions taken on the measurement func- 
tions of the corresponding layer. This characterization of the nature of 
representation is specific to type-1 neural signal processors. 

The above scheme can be generalized to allow the 'reproducing ker- 
nel basis' functions to be chosen to incorporate different directions of 
differentiation with different indices as indicated, in the case of mea- 
surement kernels, in the following expression. 






for all G = 1 , 2 ,..., and for some values of 

j = 1, 2, — While the nature of representation is unaltered, the under- 
lying wavelet frame in which the 'reproducing kernel basis' functions 
are represented incorporates rotations over the domain in addition to 
scaling and translation. A point of interest in the directional derivatives 
of neural signal processors is that they are wavelet window functions 
whose integrals are local (window) functions. Such a family of wavelets 
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have been considered, in the literature, essential in a representation of 
signals with translation invariance. 


6.5 Summary 

Point-wise nonlinear associations between integral transforms charac- 
terize the representational paradigm of neural signal processors, local- 
ization in the predicates (concepts) realized by neural signal processors 
has been studied at two levels, one due to the nature of kernels used in 
the integral transforms and the other due to the nonlinear association 
between integral transforms. Though the kernels effective in neural 
signal processors between the inputs and features as well as between 
the decisions and concepts in multi-layered neural signal processors 
are, in general nonlinear, the realization of the kernels of measure- 
ment and aggregation integral transforms as cascades of nonlinearly 
operated kernels of linear integral transforms has allowed the study of 
localization due to kernels to be reduced to one of a study of localization 
in isolated neurons which is easily extendible to a study of localization 
in neural signal processors of type-1. 

Localization in isolated neurons and, thereby, due to kernels of t 3 Tpe- 
1 neural signal processors, studied in the case of processors defined on 
function (infinite dimensional vector) spaces, has been shown to be a 
simple consequence of the observation space (ie, pattern space) being 
an innerproduct space. In order that functions represented in isolated 
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neurons be non-trivial, I have sliowii that tlie weight has to be in the 
linear span of a, possibly non-finite, collection of localized functions with 
vanishing asymptotes. The linear span of window functions, for every 
choice of window function is dense in L^(5R) [^], p = 1 , 2 , . . , and any 
finite measure p, as a simple consequence of Proposition 5.1.2 {p. 247). 
Dependence of features, responsible for neural response, on a local 
region of the incident input pattern is, thereby, assured for weights 
that satisfy the conventional criterion of physical realizability, that of 
integrability of order 71 = 1 , 2 , . : in information processing contexts, 

square integrability, based on an equation of L?’ norms with energy, is 
considered adequate for realization. 

Membership of the kernels of integral transforms related to mea- 
surement in a linear span of window functions has been shown to as- 
sure a localization in the support of the response of type-A: neural signal 
processors. In addition, the kernel of aggregation integral transform 
that is a linear combination of window functions ensures boundedness 
in processor response. Localization in the support is not restricted 
to neurons defined on function spaces and is easily seen in the case 
of processors defined on finite-dimensional spaces: the analysis has 
been conducted mainly on infinite-dimensional spaces so that the re- 
quirement of weight belonging to the linear span of window functions 
is easily appreciated. Similar to the localization induced by kernels, 
localization is introduced in the support of neural responses by activa- 
tion functions that are in the linear span of window functions: of the 
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common choices of activation functions, Gaussian functions are win- 
dow functions while sigmoidal functions, by virtue of not being in L'^ [/j.] 
for any finite measure fx, //; 5 ^ 0 , cannot be represented as finite linear 
combinations of window functions. 

Localization induced by activation functions has been studied through 
a characterization of directional derivatives of neural response. Activa- 
tion functions satisfying the requirements indicated in the axiom of dis- 
crimination-typical examples are sigmoidal (including hard-limiter) 
and Gaussian functions— have been shown to exhibit the feature that 
all derivatives, if they exist with smoothness at the asymptotes, are 
window functions: this result though established only for continu- 
ous activation functions is easily extendible to discontinuous functions 
through the theory of generalized functions (Hoskins, 1979)— this has 
not been considered in the scope of this thesis. This property of acti- 
vation functions, together with linearity of innerproducts, assures that 
the directional derivative of neural response is a expressed as a linear 
combination of window functions modulated by another localized func- 
tion and, hence, localization is induced in the support of the directional 
derivatives. 

Predicates (fimctions) represented by neural signal processors are 
local in the sense of being appropriate linear combinations of window 
functions and the extent of localization in the support of neural re- 
sponse has been identified through an analysis common in studies of 
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function representation through window transforms. The derivatives 
of activation functions that are continuous and piece-wise monotonic 
are shown to be window functions with at least the zeroth moment van- 
ishing, the basic requirements of wavelet windows: the first derivative 
might not be a wavelet window for all functions, eg, the first deriva- 
tive of a sigmoidal function is a window function but not a wavelet 
window Function representation in neural signal processing, in the 
light of the above analysis, compares with that in conventional signal 
processing, however, in neural signal processing, it is not impossible to 
express signals and their processors in a common framework, typically 
that provided by the theory of wavelet transforms: however, represen- 
tation in neural signal processors through wavelet transforms is not 
considered within the scope of this thesis. 

Concepts represented in neural signal processors inherit the local- 
ized nature of the constituent predicates. As the localization in pred- 
icates are traced to kernels (of measurement and aggregation inte- 
gral transforms) and activation functions, isolated consideration of the 
sources of localization show that the predicates represented in neural 
signal processors evaluate intra-pattern and inter-pattern features, the 
latter, in general, exhibits a directional dependence. The concepts re- 
alized in neural signal processors are derived from decisions taken on 
features of input space members and localization in the predicates has 
been shown to restrict the concept represented by each processing node 
to localized regions in the sheaf of input patterns. 



370 


Chapter 6 Localization in Neural Signal Processing Architectures 


The choice of kernels in neural signal processors influence the na- 
ture of representation and the nature of information stored in the in- 
terconnection strengths. A discussion based on kernels that are of the 
reproducing type shows that the interconnection strengths in neural 
signal processors are related to the sampling functions associated with 
the Hilbert space for which the kernel is reproducing. This nature of 
information storage suggests that conventional neural networks are 
capable of representing continuous concepts (signals) and processors 
defined over such input spaces. The nature of representation in neural 
signal processors has been shown to involve a function realization in 
a basis which is itself synthesized in a wavelet frame. Measurement 
functions are shown to be weighted averages of representations of the 
incident concept in a basis (of linearly independent functions) drawn 
from a wavelet frame. Similarly, the responses of neural signal pro- 
cessors are shown to be weighted averages of reconstructions, through 
inverse wavelet transforms, from decisions taken on the values of the 
measurement functions. 

Parallel distributed processing, in the light of the above character- 
ization of concepts, is interpreted to mean the following Distribution 
of representation, based on the restriction of the functionality of each 
processing node to local regions of the input space and a restriction of 
concept (function) synthesis to a local collection of decisions, is the syn- 
thesis of concepts, interpreted as neural signal processors responses, as 
localized evaluations of decisions on features, each of which is a mea- 
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surement on local regions of the space of input patterns; localization is 
aimed at a discovery of features specific to the incident patterns and 
features common to a collection of patterns. Parallelism in representa- 
tion is the requirement of simultaneous evaluation of predicates over 
different local regions of the input space, the simultaneity is necessi- 
tated more by the need to test the competing hyqiotheses engendered 
by localization of representation than the urge to gain an advantage in 
the complexity of computation. 
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We are agreed that with our puny intelligence and understand- 
ing we can only venture so far in the great mysteries that 
confront us on all sides in trying to account for everything in 
existence and experience. 

— Karl Raimund Popper and John Carew Eccles 
in The Self and Its Brain , 
Springer International, Berlin, 1977, 
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In this thesis, the chief concern has been a study of certain issues in- 
volved in the representation of signal processors in neural networks 
The focus has been one of representing signal processors abstracted 
to have a meaning of functional associations on the space of signals 
in the generic framework of neural networks, however, without im- 
posing any specific meaning to the nature of association or the class 
of signals. Representation of information (signal) processors has been 
studied, equivalently, as function representation: different aspects of 
representing functions have been considered -the functions for which 
a representation is sought through neural processing ensembles are 
assumed to be defined on a multi-dimensional space and to be assigned 
values in a, possibly diffei'ent, multi-dimensional space. 

Philosophical issues involved in the connectionist approach to infor- 
mation processing and a review of the historical and thematic aspects 
of the methodological issues in the representation of signal processors 
through neural networks with the aim of realizing perceptually relevant 
information processing are presented in Chapters 1, 2 and Appendix A. 
The relevance of connectionist information processing in the context of 
an automated material handling, in particular the automated handling 
of information, and the scope of activity under the banner of artificial 
neural networks have been elaborated in Chapter 1. An attempt has 
been made to develop the issues relevant in a study of neural signal 
processing in Chapter 2. 




Chapter 7. Concluding Remarks 


375 


Representation of processors in isolated neurons have been inves- 
tigated in Chapter 3 ; this chapter has focused on processors defined 
on discrete spaces. The existence of weights that preserve all points of 
certain discrete spaces has been shown and, conversely, the existence 
of discrete preservance input spaces corresponding to every non-null 
weight has been established. Preservance has also been shown to be 
independent of the radix of numbering and is invariant to scaling and 
translation of the discrete spaces. Function representation on these dis- 
crete spaces has been reduced to sequence realization and this aspect 
has been incorporated in reducing learning to a procedure involving an 
enumeration of weights and a search, in a linearly ordered space, for 
the threshold. Generalization, in the sense of function extension, has 
been shown to influence learning by altering the function representa- 
tion and, thereby, the possibility of enumerating admissible weights. 

Processor realization through layered neural processing schemes 
has been investigated in Chapter 4: this chapter continues to focus on 
processors defined on discrete preservance input spaces. The adequacy 
of single layered neural signal processors for realizing all functions of 
interest has been discussed and a suggestion for architectures minimal 
to a given processor realization situation has been made. Learning 
has been shown to involve an identification of an appropriate discrete 
input space given a training set, analytical assignment of admissible 
weights, a search for threshold in a linearly ordered space and an an- 
alytical assignment for the coefficients of linear combination of neural 
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decisions. Representation in multi-layered neural signal processors 
has been shown to be of the same nature as in single-layer proces- 
sors, though with the possibility of a deployment of fewer neurons. An 
investigation into the realization of mappings between symbol spaces 
through neural signal processors has facilitated an algebraic character- 
ization of the notion of linear separability: a dichotomy over a symbol 
space, itself embedded in a (semi) lattice, is linearly separable if each 
component in the partition induced by the dichotomy on the symbol space 
is a sub semi-lattice. 

The representation of abstract processors with a view to understand 
the representational paradigm of neural signal processors has been 
studied in Chapter 5. Four axioms have been suggested for neural 
signal processing to aid a better understanding of the mechanism of 
representation. These axioms are sufficiently general to aid a unified 
study of neural signal processing architectures. 


1. Axiom of Organization. A neural signal processor is composed of 
(layers of) three operational stages: measurement, discrimination 
and aggregation in that order. Preprocessing, if any, (preceding, 
or incorporated in, the measurement) is sought to be represented 
in a neural basis. Measurements are effected on an observation 
space constructed as the Cartesian product of the input space and 
a relevant subspace of a union of the space of responses of the 
distinct layers. 
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2. Axiom of Measurement. A neural signal processor, through the 
measurement functions in each of the processing (decision mak- 
ing) nodes, induces a foliation, of codimension at least one, in 
the input manifold. This foliation forms the basis of synthesizing 
(approximating) the desired level curves of the function. 

3. Axiom of Discrimination A neural signal processor, through its dis- 
criminatory functions, renews the foliations, induced on the input 
space by the measurement functions, through a transformation, 
of the stems of the foliations, with at least one of the following 
properties: 

(a) alter the indexing of leaves to retain distinctness in a finite 
non-zero number of local regions of the input space, 

(b) introduce multiple components in the leaves, 

(c) associate, to at least one component of a leaf of the folia- 
tion due to discrimination, uncountably many leaves of the 
foliation due to measurement. 

Re-foliations provide the basis for establishing equivalences be- 
tween members (elements) of the input space in ways not possible 
through the chosen measurement functions. 

4. Axiom of Aggregation. A neural signal processor, through its aggre- 
gation function, synthesizes (or approximates) the level regions of 
processor response through a foliation on the Cartesian product 
of the stems of foliations on the input space due to discrimination. 
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Concepts, in neural signal processors, are identified with the level 
regions of processor response. 


Point- wise associations, generally nonlinear, between integral trans- 
forms has been suggested as the representational paradigm of neural 
signal processors. This interpretation allows a unification of neural 
signal processing with conventional signal processing: it is not incor- 
rect to suggest that these approaches are complementary as neural 
signal processing is based on a search for kernels, the mechanism of 
association between integral transforms being invariant while conven- 
tional signal processors effect a realization through a search for an 
appropriate association mechanism, the nature of integral transforms 
being independent of the processor family. The kernels of the integral 
transforms of measurement and aggregation have been related to the 
class of kernels for nonlinear Urysohn operators and a few represen- 
tational features of architectures incorporating the axioms of neural 
signal processing have been investigated. A study of the representa- 
tion of activation functions that are continuous and satisfy the axiom 
of discrimination has shown that superpositions of functions with a 
permutation of weights are related to the issue of representation in 
architectures involving non-sigmoidal activation functions. This study 
provides an insight into the nature of representation in an ensemble of 
neurons wherein the weights in the nodes of a common layer are related 
to each other through permutation operations. 
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Localization characteristics in function representation through neu- 
ral signal processors has been investigated in Chapter 6. Physical re- 
quirements of function realization have shown that the kernels of mea- 
surement and aggregation integral transforms are to be in the linear 
span of window functions. The directional derivatives of the neural sig- 
nal processor response have been shown to belong to the linear span of 
(suitably chosen) window functions A characterization of the spectral 
localization has shown that the kernels of measurement and aggrega- 
tion as well as the directional derivatives of neural signal processor 
response have a wavelet representation, thereby, allowing the possibil- 
ity of a common framework for representation of signals and systems, 
an invaluable feature in the context of formulating Universal Neural 
Networks. Localization in the functionality of neural signal processors 
imply that the decisions evaluated by neural networks jointly exhibit 
the characteristics of intra-pattern predicates and inter-pattern predi- 
cates, the latter is, however, of a directional nature. Concepts in neural 
signal processors have been shown to represent evaluations over a lo- 
calized region in the 'sheaf of input patterns.’ 

The characterization, in Chapter 5, of representation in neural net- 
works as (nonlinear) point wise associations between integral trans- 
forms points to the important issue of the influence of kernel character- 
istics, especially localization, on representation. As a specific example, 

I have considered the class of neural networks wherein the kernels 
of the integral transforms of measurement and/or aggregation are of 
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the reproducing type. These kernels extend the notion of preservance, 
applicable to processors defined over discrete spaces, to functions that 
are defined over continuous input spaces as well However, the no- 
tion of preservance is not the same as representation under kernels of 
the reproducing type: the distinction stems from the requirement of 
symmetry in reproducing kernels. 

Representation in neural signal processors with kernels that are of 
the reproducing type has been shown to be equivalent to a processing 
situation wherein the measurements are reconstructions, of an inci- 
dent concept (signal), through finitely many samples of the concept. 
Finitely many samples are adequate when the incident concepts belong 
to the class of localized signals and the samples are not restricted to 
be uniformly spaced. The weight vectors (weighting functions) of dis- 
tinct neurons have been shown to be the distinct sampling functions 
associated with the Hilbert space for which the kernel, composed of the 
weight vectors (weighting functions), is a reproducing kernel. While 
the measurement kernels are easily admissible as reproducing kernels, 
the kernels of the integral transforms of aggregation are allowed to be 
of the reproducing type only if the activation functions associated with 
the aggregation integral transforms are integrable. 

An implication of a characterization of neural network operation 
in terms of point-wise (nonlinear) associations between integral trans- 
forms is that the issue of learning (and generalization) is equated to a 
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design, or selection, of the kernels of the integral transforms of mea- 
surement and aggregation. The uniqueness of the reproducing kernel 
for a given Hilbert space and the decomposition of reproducing kernels 
in a basis of linearly independent vectors (functions) allows the nature 
of representation in neural signal processors to be precisely established: 
as a consequence, the measurements in any layer are representations 
of the incident concepts and, in a similar way, the responses (aggre- 
gates of decisions) of a neural signal processor are representations of 
* the decisions taken on measurements. Formally, the representational 
nature, in neural signal processors, has been shown to be in the sense 
of a weighted average of wavelet transforms 

The study of localization in neural signal processors has shown that 
as the depth of layering increases, so does the degree of localization, ie, 
the effective receptive fields shrink with the depth of layering. While 
this statement has been established, largely, for activation functions 
that are of the sigmoidal t3rpe, the denseness of finite linear combi- 
nations of (domain) shifted and (domain) scaled sigmoids assures that 
this property is true of neural signal processors incorporating activation 
functions that are continuous and satisfy the axiom of discrimination. 
The nature of representation explored in this thesis allows the follow- 
ing characterization of the nature of representation in neural signal 
processors to be conjectured: ’shallow’ networks are well suited for 
representing processors that have formal descriptions whereas ’deep’ 
networks are necessary when the entities operating in a formal sys- 
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tem needs to be identified/discovered Bascjd on the characterization 
in Chapter 6, it would not be incorrect to suggest that symbolization 
(or ’symbol synthesis’) involves a process of i dentify i iig:, or discovering, 
local regions in the sheaf of input patterns, and the means of recog- 
nizing, either in isolation or in conjunction with other symbols (local 
regions in the sheaf of patterns), and establishing assoc iations between 
s 3 mbols: the latter requirements imply a recursive usage of the object 
and meta-language level constructs of symbols. 

In the present investigations of neural networks, the architectures 
are predominantly of the ’shallow’ kind, ie, a relatively few layers are 
deployed, each with a large number of massively interconnected pro- 
cessor ensembles.^ Architectures that are ’deep’ are made of a large 
number of layers, each with sparsely connected processor ensembles, 
Such network structures will aid in implementing, in a neural basis, the 
requisite preprocessing (symbolization) of available signals. In terms 
of the axiom of measurement, symbol processors are characterized by 
foliations whose leaves have a relatively lesser curvature in compari- 
son with the leaves of the foliations associated with processors that are 
involved in ’symbol synthesis.’ As expected, the representation of a pro- 
cess of ’symbol synthesis’ is intractable compared to the representation 
of ’symbol manipulation’ processes. Present neuro-anatomical evidence 
does not seem to refute the above conjecture. The cortex and neo-cortex, 

^Typical examples are the network structures of Hopficld, Kohonen, Grossberg, cte, 
which have a completely connected graph. 
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the seat of (conscious) symbolic activity, is organized as ’shallow' proces- 
sors. In contrast, the mid-brain, the relatively less explored region of 
mental activity, is composed of ’deep’ networks: this region is believed 
to be responsible for the (sub conscious) associations that are related to 
long term memory traces. 

To sum up, the key findings of the attempt, in this thesis, at a char- 
acterization of representation of (signal) processors in the connectionist 
approach to computing are as listed in the following: no claim, however, 
is made as to the exhaustiveness of this study 


1. The interconnection strengths between processing nodes in an en- 
semble of (interconnected) neurons store knowledge of association, 
between inputs and outputs, by accommodating a preservation of 
structural regularities in the members of a certain discretely sam- 
pled subset of the input signal (pattern) space. When the input 
space is embedded in an Euclidean space of finite dimensions, the 
weights have been shown to preserve uniqueness and relative or- 
der between (input) vectors of suitably chosen discrete subsets of 
the input space. In contrast, the interconnection strengths be- 
tween processors relate, in the case of a signal space consisting 
of continuous and localized signals (functions) on a continuous 
domain and embedded in a Hilbert space, to sampling functions 
associated with the reproducing kernel of the input space. The 
nature of information storage, in the interconnection strengths, 
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under the notions of preservance and the reproducing property of 
kernels are related, in the sense that there exists a non empty 
overlap between the space of kernels of the reproducing type 
and the collection of kernels^ constructed through preservance 
weights, however, these notions are neither identical, nor is one 
notion reducible to the other. 

2. In both forms of information storage, ie, the association of weights 
to input spaces through preservance and the selection of kernels 
that are reproducing in nature with respect to the input space, a 
knowledge of the structure in the discrete sampled subset of the 
input space is central to the issue of learning (and generaliza- 
tion), essentially a problem of kernel design. The representation 
of processors (functions), in the case of input spaces embedded in 
finite dimensional Euclidean spaces, has been shown to reduce 
to a process involving an identification of the preservance weight 
appropriate to the collection of inputs specified through the train- 
ing set (ic, repertoire of examples), and an enumeration of the 
weights, in the first layer, in the class of preservance weights for 
the preservance input space. In contrast, processor (function) rep- 
resentation on signal spaces that are embedded in a Hilbert space, 
of continuous signals defined on a continuous domain, involves a 
synthesis of a collection of linearly independent 'basis' functions 

^Kernels constructed through preservance weights are not restricted to exhibit sym- 
metry. An imposition of the requirement of symmetry implies that preservation is effected 
not only in the measurement operation, but also in aggregations 
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through which the kernels, of a reproducing nature, are realized: 
these ’basis’ functions, shown to belong to a wavelet frame, punc- 
tuate the character of representation in neural signal processors. 

3. Operationally, the interpretation of neural signal processors ef- 
fecting point- wise (nonlinear) associations between integral trans- 
forms, in the context of kernels that are chosen to be of the repro- 
ducing type, signifies that the functional character of represen- 
tation in neural signal processors is one of establishing nonlin- 
ear associations between reproducing kernel Hilbert spaces. This 
characterization of representation in neural signal processors is 
of particular importance in neural networks organized to have a 
finite number of nodes, each operating on a finite number of in- 
puts. In such networks, the issue of learning (and generalization) 
is not merely one of realizing the nonlinear association through 
an appropriate composition of suitably selected layers of neural 
processing, but also involves the crucial issue of a representation 
of the input signal space through adequately chosen sampling 
functions. The nature of representation of the input space in the 
measurement functions has been related, in this thesis, to the 
representation offered by finite linear combinations of members 
of a wavelet frame. I have also shown that the nature of represen- 
tation in neural signal processors allows a common framework for 
a characterization of the representation of signals as well as pro- 
cessors, thereby suggesting the feasibility of an inquiry into the 
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theory of computation in the formalism of neural networks: in 
view of the distinct differences between the formalisms of Neural 
Networks and Turing Machines, the theory of computation in neu- 
ral networks can be expected to be different from that currently 
established in the context of Turing Machines.^ 

4. The connectionist approach to representation of (information) pro- 
cessors is not restricted to the realization, or approximation, of 
functions on suitably defined spaces of numbers. An algebraic 
characterization of the principle underl 3 dng the basic processing 
unit, ie, the notion of linear separability, has been provided to 
show that categorization of the linearly separable kind partitions 
a (semi) lattice into sub semi-lattices. This characterization pro- 
vides the key to designing schema of linear categorizers’ on sym- 
bol spaces. Neural networks need not be restricted to be orga- 
nized as a schema of interconnected linear categorizers ' I have 
suggested four axioms that capture the essence of the prevalent 
architectural varieties in the connectionist approach to (informa- 
tion) processor representation. Of these, the axiom of measure- 
ment relates the representation of the given input signal (pattern) 
space with that of the nonlinear association between the input 

^Note that in neural networks, the focus is one of seeking a representation given 
’adequate’ examples. It is of interest to investigate the nature of computation in neural 
networks vis a vis the currently accepted notion of computing m the framework of Turing 
Machines This investigation is particularly relevant in an inquiries seeking the ability of 
neural networks to represent decisions related to the decision making of neural networks* 
this inquiry needs the formulation of universal neural networks. 
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and output spaces. In a similar way, the axiom of aggregation 
(nK'jisurt'nHMilH nr(> not iinn'lal(Ml It) dt' nggrt'gatioiiH) it'Inl-t'H tho 
representation of association between the input and output spaces 
with that of the desired output signal (pattern) space. The axiom 
of discrimination, through a consideration of the representation 
of (nonlinear) associations between the entities that represent the 
input and output signal spaces, links the isolated representations 
of the input and output signal spaces. 

In Chapter 3, the notion of preservance has facilitated a precise 
characterization of the number of functions of a specified order of sep- 
arability {eg, linear separable functions) that can be realized in any 
preservance weight ’direction’ given the dimensionality of the input 
space and the index of ranking in the preservance input space. This re- 
sult cannot, however, be readily used to state the exact number of func- 
tions of a specified order of separability on a preservance input space 
of given dimensionality and ranking even through the finite number 
of preservance weight ’directions’ given the input space dimensional- 
ity is precisely known. This limitation ensues in view of the fact that 
the algebraic characteristics of the class of preservance weights are 
not completely known. It is, thereby, imperative that such a charac- 
terization of the class of preservance weights be investigated. This 
characterization will also enable a better appreciation of the issue of 
learning and generalization in layered neural signal processors defined 
on preservance input spaces. 
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The characterization of representation, in Chapter 4, of functions in 
neural signal processors defined over preservance input spaces was re- 
stricted, in processors of the multi-layered variety, to the case wherein 
only the weights in the first layer has the interpretation of being preser- 
vance weights of the preservance input space. While such a restriction 
simplifies the representation of functions on multi-variate input spaces 
to a situation of sequence realization on univariate spaces, the notion 
of preservance has not yet been fully exploited. An investigation into 
the nature of representation in multi-layered neural signal processors 
wherein the weights of the nodes in each layer is a preservance weight 
of collection of vectors formed by the responses of the nodes in the pre- 
ceding layer is of interest to seek the computational advantages such 
a scheme offers in the context of learning and generalization. In addi- 
tion, the inquiry in Chapter 4 has not been aimed at processors whose 
weights are drawn from rotated versions of the collection of preservance 
weights associated with some other node in the same layer: however, 
in Chapter 5 this problem is related, cursorily, with the issue of realiz- 
ing activation functions satisf 3 Hing the axiom of discrimination as finite 
linear combinations of shifted sigmoidal functions. 

An adequate exposition outlining the nature of a theory of represen- 
tation in neural signal processors does not follow the axioms of neural 
signal processing. The lack of such an insight, in Chapter 5, stems 
from the fact that while the investigation of foliations, in the context 
of category theory, is substantial, there does not seem to be enough 
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characterization of foliations in terms of their stem (in fact, the set that 
supports an indexing of the leaves of a foliation does not seem to have 
been given enough consideration in the investigations). In order that 
the full potential of the axioms of neural signal processing be under- 
stood, it is essential to investigate foliations to provide the required 
characterization in terms of the stem. 

Point-wise (nonlinear) associations between integral transforms, as 
a characterization of the operation of neural networks, allows an ap- 
preciation of the functionality of neural networks in terms of the ker- 
nels. The incorporation of available knowledge through kernels inspires 
newer issues, the principal one being the influence of correspondences 
(or correlations) in the weights (weighting functions) of distinct nodes 
on the representation potential. The reproducing property in the ker- 
nels, a situation that restricts the choice of weights of the incoming 
channels in a node to be the same as the interconnection strengths of 
the outgoing channels in the corresponding node of the previous layer, 
has been shown to state the nature of representation in terms of sam- 
pling functions. In order to aid a better understanding of the nature 
of representation in neural signal processors it is necessary to con- 
tinue the characterization of representation in neural signal processors 
wherein the kernels are realized through the responses of some other, 
appropriately chosen, neural signal processors. This investigation will 
need a substantial incorporation of the representation characteristics 
suggested by the axioms of neural signal processing. 
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New Directions 


In Chapter 6, the derivatives of sigmoidal activation functions, and 
thereby all activation functions that are continuous and satisfy the 
axiom of discrimination, have been shown to be localized functions sat- 
isfying the requirements of window functions. The first derivative of 
the sigmoidal functions is of specific interest as this functional form, the 
square of the h 5 q)erbolic secant function, has been extensively used in 
the study of nonlinear evolutionary systems, t^ypically nonlinear wave 
propagation and interaction between traveling waves. In these studies, 
the square of the hyperbolic secant'^ is a basic solution of the popu- 
lar form of the Korteweg de Vries (KdV) equation and all solutions of 
this equation are termed solitons (more precisely solitary waves). (See 
Lamb, 1980, Rajaraman, 1982, Drazin, 1983 and Drazin & John- 
son, 1989 for the notion of solitary waves and solitons.) 

Lax (1968) has shown that solitons are related, through a squaring 
operation, to the eigen functions of a Schrddinger (second order) differ- 
ential operator. This aspect has been used in finding the solutions to the 
KdV equation through the method of inverse scattering,^ The identical- 

^The sigmoidal activation function is also a basic solution of the class of KdV equations 
However, this form of the KdV equation is not frequently used in the study of nonlinear 
evolutionary systems Evolution in systems with solitary waves of the sigmoidal kind 
have been considered in the investigations of von Neumann and Ulam. 

^The KdV equation is a non-linear evolutionary equation and, hence, the solutions to 
the differential equation are not given by the linear span of the basic solutions. Inverse 
scattering relates the spatial evolution to the temporal evolution through a linear operator 
(this operator is associated with the Backlund transformations). 
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ity of the functional form of the eigen solutions of Schrodinger operators 
with the derivatives of the sigmoidal activation functions prompts a nat- 
ural curiosity into the viability of the KdV equation governing the oper- 
ational aspect of the axiom of discrimination. Of even greater interest 
is the applicability of the approach of inverse scattering in providing an 
insight into the nature of representation in neural signal processors. 
An isolated neuron whose activation function is a soliton has the in- 
terpretation of representing the dynamics of entities propagating in a 
rectilinear space-time continuum. Carrying this interpretation over to 
neuronal ensembles, a neural signal processor represents the dynamics 
of entities traveling in a curved space-time continuum, the curvature 
increasing as the degree of layering increases. 

The investigation, in Chapter 6, into the nature of representation 
in neural signal processors has shown the possibility of incorporating a 
common framework in studies related to the representation of signals 
and (nonlinear) associations between signal spaces. A common frame- 
work for representing signals and their processors would be needed in 
studying the learning of neural signal processing through neural sig- 
nal processors, ie, to develop the concept of universal neural automata. 
This exercise will be useful in understanding the limits of representa- 
tion through the paradigm of learning by examples' and will enable 
a formulation of the notion of neural decidability. Further, a study 
of complex network structures, through means not entirely computa- 
tional, will be of help in an identification of network structures that 
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would be capable of an automated expression of functional characteris- 
tics that are anthropocentric and anthropomorphic: a significant step 
in an attempt to reach the ultimate goal of artificial intelligence. 

On recognizing that neural networks belong to the larger class of 
processing structures involving a collection of functions indexed on lat- 
tice points, the computational basis of neural networks is seen to be 
identical to that used in the formalisms of Turing Machines, Finite 
State Machines, Grammars, Normal Algorithms, Cellular Automata, 
etc. An immediate generalization is to consider function fields over 
partially ordered index spaces. This abstraction raises new questions, 
the most important one of which relates to the interplay between inter- 
function interactions and macroscopic functional specificities. Such 
an interplay will be essential in a study of the cognitive capacity of 
the information processing approaches to automated intelligence. An 
incorporation of partial ordering in the index spaces will enable a mean- 
ingful representation of lists in neural signal processors and, thereby, 
facilitate studies in the understanding of automated processes with 
perceptual relevance. 
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All endeavors involving the ’interchange’ of ’materials’— abstracted to 
include the isolated or combined participation of manifestations of ’mat- 
ter,’ ’energy and ’information’ -necessitate operations, or the opera- 
tional equivalents, of decision making euid/or the recognition of pat- 
terns. Collectively termed ’information processing,’ these operations 
are sought to accentuate, as inferred (judged) by a participant (ob- 
server), the ’information content’ in the signals that facilitate (and, 
possibly, necessitate) the ’interchange’ of ’materials.’ 

In this Appendix I will provide a glimpse into the nature of auto- 
mated intelligence and outline the two prominent traditions in the au- 
tomation of intelhgence. Following this I will briefly discuss the need 
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for nonlinear methods in the processing of signals and will outline some 
of the interpretations that have been suggested, in the literature, to the 
processing of signals in the connectionist framework. These interpre- 
tations are relevant in understanding the representation potential of 
the neural network approach to the automation of intelligence. 


A.1 Nature of Automated Intelligence 

Success in the production of sustained energy which facilitated rapid in- 
dustrialization, enabled a shift in investigations towards optimal means 
of harnessing available energy in material handling, inventory man- 
agement and coordination of material flow between machines, plan- 
ning and organization, in short, operations research. This focus has 
demanded extensive studies in the understanding (to aid a representa- 
tion) of the nature of data, information, and knowledge, and of methods 
by which data should be organized to support inquiries oriented at 
seeking information required in the development of knowledge. 

It is interesting to note that this course of events were predicted by 
von Neumann (c/, Burks, 1970) as evident below. 

John von Neumann pointed out that in the past, science had dealt 
mainly with the concepts of energy, power, force, and motion, and 
he predicted that "in the future science would be more concerned 
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with problems of control, programming, information processing, 
communication, organization, and systems," 


The success in automating manufacturing processes with mechaniza- 
tion in the processing of information has triggered a new wave of activ- 
ity, ie, the automation of intelligence. 

Mechanized expression of intelligence is being looked at in the per- 
spective of endowing machines with the ability of identifying objects 
{eg, tools and raw materials), and taking decisions regarding the na- 
ture and extent of processing required on identified objects. Processing 
of information, in automated systems, is generally through a variant of 
a hierarchy of (semi) autonomous interconnected processes.^ 

At one extreme lies centralized processing characterized by a single 
process in constant communication with all other processes operating 
as slaves. In contrast, the other extremity is typified by fully dis- 
tributed, or synergetic, processing wherein no process has a total view 
of the system, yet a multitude of local operating criteria, incorporated 
in the (semi) autonomous processes, provide the necessary cohesion and 
cooperation to allow for stable patterns of evolution to emerge. 

The interacting nature of information processing compels the par- 
ticipating processes to use a language, or an encoding (not necessarily a 

^ It is common to interpret processes in the restricted sense of being processors How- 
ev^, the abstraction is applicable to situations wherein a distinctive informative identity 
can be assigned to the various instances of processes. In the more sophisticated cases, 
processors are ascribed agental status. 
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formal system), for the interchange of information through states, sym- 
bols, or messages. An essential part of the processing capability is to 
be invested in detecting and/or estimating received information in the 
wake of distortions, introduced either due to insufficient precision in the 
response of a process (ie, processes with malformed messaging units), 
or due to interferences introduced by other coexisting information car- 
riers (or noise sources), possibly with the capability of dominating the 
output of the information source of interest. 

Each process participating in the processing of information should, 
therefore, be able to recognize the signals and messages put out by 
other processes (guided by the past), and to take decisions (action) in 
anticipation of that taken by other (competing) processes: this aspect is 
of importance in the design of the language, or coding scheme, used in 
between interacting processes. Thus, pattern recognition and (signal/ 
state) estimation form the essential core of information processing in 
automated systems, especially automation of intelligence. 

Recognition of patterns, also termed (signal) detection, essentially 
involves an a priori identification of associations between prototype pat- 
terns and corresponding labels, or cluster (or class) memberships, and 
the problem at hand is to devise tests, or detectors, capable of represent- 
ing the requisite class memberships:^ available patterns are associated 

^In this form, adaptive classification is not allowed. Further, the modulation of classi- 
fication by value systems are completely ignored. Both of these are considered essential 
to appreciate human intellectual abilities The aspect of adaptive categorization has been 
considered in Edelman (1987) in relation to modeling of human perception. 
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with classes based on an appropriately chosen criterion of inter-pattern 
distance. Known as classification, hypothesis testing, clustering etc^ 
depending on the available information regarding class memberships, 
and the context in which the pattern recognition task is being consid- 
ered, several detection methods have been suggested, specially in the 
context of statistical decision makings wherein the signals are presumed 
available in a noisy context. 

Pattern recognition necessarily has a finite number of classes, this 
requirement stems from the terminability criteria particular to algo- 
rithms (Turing Machines), discussed in the theory of computation. Is- 
sues related to the performance of pattern recognizers, apart from those 
of computational complexity of the testing method, dwell on the like- 
lihood of misclassification: this issue translates to that of the error in 
approximating the (specified) class membership (m, indicator) function. 

Class membership, considered on a discrete - generally binary - scale 
(also called crisp) for long, is recently being viewed on multi-valued, 
even continuous, scales, and pattern recognition incorporating such 
measurements have been given the interpretation of capturing fuzzy 
rules of inference. Automation of intelligence with fuzzy rules of in- 
ference is believed to be more anthropomorphic than that achieved 
through crisp rules of inference. 

Signal/state estimation, the other important constituent of informa- 
tion processing in automated systems, consists of methods to extend 
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the notion of categorization to situations wherein the number of cate- 
gories is arbitrarily large to render the approach of pattern recognition 
inapplicable. Estimation procedures need a knowledge of associations 
between regions in the signal (state) space and the space of categories, 
the latter could well be isomorphic to the signal (state) space on whose 
members the estimation procedure is being applied. 

In point estimators, the associations sought are between signal sam- 
ples, ie, points, and regions (subsets) of the space of categories. Estima- 
tion too, in an abstract sense, reduces to a problem of function approx- 
imation, ie, approximation of a function from an appropriate algebraic 
structure defined on the signal (state) space to another appropriately 
constructed algebraic structure on the space of categories. 

The knowledge needed by estimators is generally supplied in terms 
of (parametrically expressed) likelihoods of associations, or, in the ab- 
sence of evidence for deriving such information, in terms of likely re- 
lationships between (ordered) clusters in the signal space and the cat- 
egory of the signal (state) in relation to which clusters are sought -ie, 
non-parametric approaches In view of the nature of processing in- 
volved in estimation, these procedures have found utility in the pre- 
diction of a portion of a signal given some other segment of the same, 
typically prediction of future samples of a signal given a finite past. 

In the automation of intelligence, one of the goals is to extend pattern 
recognition and (state) estimation to ’ideas’ and ’concepts,’ in addition 
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to performing the same on objects (le, signals). The information used 
in designing an automated system is generally termed as knowledge 
base, and the essential issue in the design of these systems lies in the 
representation of the knowledge base. Neural networks and classical 
AI differ in the way knowledge bases are ’internally’ represented. 

Programming, in the sense of design and implementation of algo- 
rithms, plays an important role in the representation of knowledge in 
classical AI, whereas in neural networks, the task of knowledge rep- 
resentation is studied under the metaphor of learning. It is common 
to find that while classical AJ is directed mainly at pattern recogni- 
tion, methods based on neural networks have been suggested to handle 
problems related to pattern classification as well as estimation. 


A.2 Automation of Intelligence: Important 
approaches 

Connectionist or PDP models are catching on. There are confer- 
ences and new books every day, and the popular science press hails 
this new wave of theorizing as a breakthrough in understanding 
the mind (a typical example is the article in the May issue of Sci^ 
ence 86, called "How we think: A new theory"). There are also, 
inevitably, descriptions of the emergence of Connectionism as a 
Kuhnian "paradigmatic" shift. (See Schneider, 1987, for an ex- 
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ample of this and for further evidence of the tendency to view Con- 
nectionism as the "new wave" of Cognitive Science ) 

The fan club includes the most unlikely collection of people. 
Connectionism gives solace both to philosophers who think that 
relying on the pseudo-scientific intentional or semantic notions of 
folk psychology (like goals and beliefs) mislead psychologists into 
taking the computational approach (eg, Paul Churcliland, 1981; 
Churchland, 1986; Dennett, 1986); and to those with nearly 
the opposite perspective, who think that computational psychol- 
ogy is bankrupt because it doesn’t address issues of intentionality 
or meaning (eg, Dreyfus & Dreyfus, 1988). On the computer 
science side, Connectionism appeals to theorists who think that se- 
rial machines are too weak and must be replaced by radically new 
parallel machines (Fahlman & Hinton, 1986), while on the bio- 
logical side it appeals to those who believe that cognition can only 
be understood if we study it as neuroscience (eg, Arbib, 1975; Se- 
jnowski, 1981). It is also attractive to psychologists who think 
that much of the mind (including the part involved in imagery) is 
not discrete (eg, Kosslyn & Hatfield, 1984), or who think that 
cognitive science has not paid enough attention to stochastic mech- 
anisms or to "holistic" mechanisms . . and so on and on. It appeals 
to many young cognitive scientists who view the approach as not 
only anti-establishment (and therefore desirable) but also rigorous 
and mathematical . . . Almost everyone who is discontent with con- 
temporary cognitive psychology and current "information process- 
ing" models of the mind has rushed to embrace "the Connectionist 
alternative". 
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When taken as a way of modeling cognitive architecture^ Con- 
nectionism really does represent an approach that is quite different 
from that of the Classical cognitive science that it seeks to replace 
Connectionists propose to design systems that can exhibit intel- 
ligent behavior without storing, retrieving, or otherwise operating 
on structured symbolic expressions The style of processing carried 
out in such models is thus strikingly unlike what goes on when 
conventional machines are computing some function 

The term ’Connectionist model' (like Turing Machine' or '[v]on 
Neumann machine') is thus applied to a family of mechanisms that 
differ in details but share a galaxy of architectural commitments 

With these words^ Fodor & Pylyshyn (1988a) introduce connec- 
tionism, or neural networks, in contrast to Classical AI: the other dom- 
inant approach to automated intelligence. In this section, I will trace 
the common history of these approaches, and briefly outline the evo- 
lution of ideas in the mechanized expression of intelligence. Despite 
statements to the contrary, it is nearly impossible to argue that our 
perception of intelligence is different from information processing, and 
the view that intelligence is consequent on information processing, to- 
gether with the approach of function realization, has dominated inves- 
tigations in the connectionist approach to automated intelligence (see 
eg, Rosenblatt, 1958; Minsky & Papert, 1969; McClelland, Rumel- 
hart, et al, 1986a). 

^In this quotation, I have recoded the original citations to maintain consistency with 
the citations in the rest of document 
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Automation of intelligence is traced uniquely to the pioneering ef- 
forts of McCulloch & Pitts (1943) describing the logical calculus im- 
manent in the nervous activity While significant research had by then 
been accomplished as to the anatomy and even physiology of the brain, 
and the brain had been expressed as a composition of neurons, and the 
neuronal state transition studied as a function of the electro-chemical 
equilibration, McCulloch and Pitts were the first to recognize the logical 
operations (in fact operations of propositional calculus) incorporated by 
the very structure of neurons. 

George Boole’s proposal of a nice mathematical theory for an algebra 
of (propositional) logical operations (later named as Boolean algebra) 
being available, the discovery that biological neurons implement logi- 
cal operations, at a time when electronics was slowly gaining ground, 
specially in the realization of digital computers, significant number of 
researchers were inspired to take up a study of biological information 
processing systems. It is worth mentioning that McCulloch continued 
his work in collaboration with Norbert Wiener in an area, which Wiener 
termed Cybernetics. Murray Eden (1983) traces the following.^ 

Norbert Wiener, in his book entitled Cybernetics, or Control and 
Communication m the Animal and the Machine, did not define ex- 

“^At first glance, this quotation may seem out of context. However, it is important to 
recognize that the character of Artificial Intelligence, whether classical or connectionist, 
specially in the light of emerging trends in applications, is taking on the same aspect of 
cybernetics, ie, AJ is providing a framework for choice and decision-making This aspect 
of AI is increasingly being incorporated into the control of mechanisms. 
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plicitly the word he believed he had coined . . It had, in fact, been 
used before* by Andre-Marie Ampere in his Essai sur la philosophie 
des sciences . 

Ampere gave the following description of what he meant by cy- 
bernetics: "Cybernetics [cybemetique]. Relations between peoples, 
the subjects of study within . . international law and diplomacy . 
are only a small part of that which a good government must concern 
itself. Maintenance of public order, administration of laws, equi- 
table distribution of taxes, selection of the people it must employ, 
and all . . . other considerations . . , require the continual attention 
of government. Choices must constantly be made, among diverse 
measures, about which measure is most appropriate to achieve the 
desired goal. Only by intensive study and comparison of the vari- 
ous elements that, for each choice, are provided by a knowledge of 
all that is relevant to the nation -its character, customs, opinions, 
history, religion, way of life and property, institutions, and laws - 
can government create the general rules of conduct that must guide 
it in regard to each particular case. Therefore, it is only after all 
the sciences that are concerned with these various factors that one 
must place the science in question here. I would call this science cy- 
bernetics from the word Kv/3epi/rjT7ja, From the restricted definition 
for the art of steering a vessel, cybernetics took on a meaning - even 
among the Greeks —of the art of steering in general — 

However, Ampere, despite his statement that he was generaliz- 
ing the concept of steering, was not aware that it could be extended 
to the regulation of organismic behavior . . His classification of 
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biology contained no niche for control, nor for that matter did his 
classification of physics. 

One group to get inspired by the work of McCulloch and Pitts was 
that led by John von Neumann at Princeton. The structure of logical op- 
erations implemented by neurons were the source of inspiration for the 
basic electronic assemblies in the design of the first electronic digital 
computer ENIAC: these gates form the building blocks of present day 
computational devices too! Associating each neuron with an automata 
(of the finite-state kind) and recognizing the importance of intercon- 
nected ensembles of automata, von Neumaim initiated the area, which 
he termed. Cellular Automata (cf, von Neumann, 1959; 1966). 

Though the initial hope was to seek for an account of human intel- 
ligence in terms of such interconnected ensemble of automata, his own 
admissions state that the area of cellular automata is far removed from 
the problems crucial to capture, or explain, intelligence in mechanis- 
tic terms (cf, Brink & Haden, 1987). In passing we should recognize 
that the field of cellulm automata, though initiated by von Neumann, 
owes its present existence and form to the extensive investigations of 
Stephen Wolfram (1986) and others. 

Early computers though seen as manipulating numbers, strings of 
bits manipulated by a digital computer were soon recognized as being 
capable of representing anything- numbers, of course, but also features 
of the real world as evident in the following. 
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The digital-computer field defined computers as machines that ma- 
nipulated numbers. The great thing was, adherents said, that 
everything could be encoded into numbers, even instructions. In 
contrast, the scientists in AI saw computers as machines that ma- 
nipulated symbols. The great thing was, they said, that everything 
could be encoded into symbols, even numbers. (C/, Newell, Shaw 
& Simon, 1958; Newell, 1983; Dreyfus & Dreyfus, 1988.) 

The particular interpretation of each neuron being an automaton, 
and that intelligence is biologically expressed through an interconnec- 
tion of such finite-state machines, essentially symbol manipulation de- 
vices, Newell and Simon proposed their views in the famed hypothesis 
quoted below (c/*, Newell & Simon, 1981; Dreyfus & Dreyfus, 1988). 
This h 3 qD 0 thesis forms an essential component of classical AI. 

Physical Symbol System Hypothesis A physical symbol system has 
the necessary and sufficient means for general intelligent action. 

By "necessary" we mean that any system that exhibits general 
intelligence will prove upon analysis to be a physical symbol sys- 
tem. By "sufficient" we mean that any physical symbol system of 
sufficient size can be organized further to exhibit general intelli- 
gence. 

Encouraged by this hypothesis, research effort on General problem 
solvers and Expert systems were initiated to once and for all solve the 
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intricate problems that face all of humanity. The initial success of this 
automated information processing approach led John McCarthy, Mar- 
vin Minsky, Nathaniel Rochester and Claude E Shannon to formulate 
the program of Artificial Intelligence {cf, McCorduck, 1979). 

According to this program, studies of representation of knowledge, 
formal methods to facilitate representation, a theory of information 
and its processing, etc, were the thrust areas of research. This formal 
systems approach is also known as the ’Top-Down’ approach as the 
methodology adopted here is to satisfy, at each level of inquiry, the 
logically necessary tasks (operations) dictated by the goals enunciated 
by the previous levels - this hierarchical structure is hoped to ultimately 
pronounce the brain as the information structure logically necessary 
and sufficient for the expression of intelligence. 

While significant interest had been developing in the field of classical 
AI, a few researchers, represented in their ideas best by Frank Rosen- 
blatt, were interested in a very different approach. In Rosenblatt’s own 
words (c/, Rosenblatt, 1962; Dreyfus & Dreyfus, 1988) 

The implicit assumption [of classical AI] is that it is relatively 
easy to specify the behavior that we want the system to perform, 
and that the challenge is then to design a device or mechanism 
which will effectively carry out this behavior . . . (Ijt is both easier 
and more profitable to axiomatize the physical system and then 
investigate this system analytically to determine its behavior, than 
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to axiomatize the behavior and then design a physical system by 
techniques of logical synthesis. 


In this approach, also termed ’Bottom~Up’ approach, no assertion is 
made of the brain being the only information processing structure to 
express intelligence: rather an appeal is being made to explore possible 
structures for the mechanized expression of intelligence. Rosenblatt 
was responsible for Per ceptrons, machines claimed to be capable of per- 
ception in ways similar to those exhibited by human beings. The per- 
ceptrons were trained by presenting examples of the task to be learnt, 
and the learning (of weights) was based on a rule of learning, in bi- 
ological neurons, discovered by Hebb (1949) which states that the 
interconnection strength between two (adjacent) neurons increases in 
proportion to the relative simultaneity of their firing patterns.® 


^Anderson & Rosenfeld (1989), p 1-3, introducing William James (1989), point 
out that a rule similar to that proposed by Hebb, though at a macroscopic scale, involving 
neural clusters rather that just neurons, was suggested by Sir William James. This 
makes one ponder if such multi-scale organizational patterns exist in the brain, and such a 
structure can be exploited, through current knowledge about fractals (Mandelbrot, 1987; 
Barnsley, 1988) and scale-space filtering (Witkin, 1986), for a better understanding 
of the nature and structure of cognition. It would, indeed, be exciting if the search 
for structure — a research program aimed at a search for structure has been initiated 
in the field of crystal growth— is unified with the microstructure of cognition: such a 
unification would need the language of general systems theory (Klir, 1977), synergetic 
systems (Haken, 1977) and self organization (Hawkins, 1961; Haken, 1983; Kohonen, 
1984; von Foerster & Zopf, 1962) The extensive investigations, by physicists, relating 
Ising spin systems (le, statistical thermodynamics) to neural networks (see, eg, Amit, 
1989; van Hemmen, 1986; van Hemmen, Grensing, et al, 1988a; 1988b) suggest that 
the (micro) structure of cognition might not be unrelated to that of crystal growth, in 
particular, the emergence of macroscopic (topological) symmetries. 
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Perceptrons have been the basis of investigations in neural net- 
works, or connectionist networks carried out in the greater part of the 
latter half of this century. McClelland, Rumelhart, et al (1986a) 
describe perceptrons as in the following. 


Such machines consist of what is generally called a 7'etina, an an'ay 
of inputs sometimes taken to be arranged in a two-dimensional 
spatial layout; a set of predicates, a set of threshold units with 
fixed connections to a subset of units in the retina such that each 
predicate computes some local function over the subset of units to 
which it is connected; and one or more decision units, with modifi- 
able connections to the predicates ® 


Rumelhart and McClelland use the term Parallel Distributed Process- 
ing (PDP) for the cognitive models described by neural networks (con- 
nectionist AI): 

Smolensky (1990), however, differs from the idea that connec- 
tionist information processing has to conform architecturally with the 
brain, and in his proposal for a Sub-Symbolic Processing Paradigm 
states the following. 


®In present terminology, Rosenblatt’s perceptrons would be termed 2 layered feed 
forward neural network (3 layered, if layering is considered on the basis of entities in 
an ensemble bearing information rather than their exhibiting information processing 
ability). 
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[T]he term "subsymbolic" is intended to suggest cognitive descrip- 
tions built up of the constituents of the symbols used in the symbolic 
paradigm [le^ classical AI]; these fine-grained constituents might 
be called subsymbols. Entities that are typically represented in 
the symbolic paradigm by symbols are typically represented in the 
subsymbolic paradigm by a large number of subsymbols . . Sub- 

symbols are not operated upon by "symbolic manipulation": they 
participate in numerical - not symbolic computation . . 

Since the level of cognitive analysis adopted by the subsym- 
bolic paradigm for formulating connectionist models is lower than 
the level traditionally adopted by the S3rmbolic paradigm, for the 
purposes of relating these two paradigms it is often important to 
analyze connectionist models at a higher level; to amalgamate, so 
to speak, the subsymbols into symbols . . I will call the preferred 
level of the symbolic paradigm the conceptual level and that of the 
subsymbolic paradigm the subconceptual level . . 

The intuitive processor possesses a certain kind of connectionist 
architecture (which abstractly models a few of the most general 
features of neural networks) . . 

[KJnowledge in a connectionist system lies in its connection 
strengths . 

The intuitive processor is a subconceptual connectionist dynam- 
ical system that does not admit a precise formal conceptual-level 
description 

Given an input, a subsymbolic system outputs a set of infer- 
ences that, as a whole, gives a best fit to the input, in a statistical 
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sense defined by the statistical knowledge stored in the system’s 
connections 


Rumelhart & Norman (1981) argue that 

Information [in neural nctwoi’ks] is not stored anywhere in partic- 
ular Rather, it is stored everywhere. Information is better thought 
of as "evoked" than "found." 

This contrast of the nature of knowledge representation in neural net- 
works with that in classical AI is further strengthened by the following^ 

In most models, knowledge is stored as a static copy of a pattcin. 
Retrieval amounts to finding the pattern in long-term memory and 
copying it into a buffer or working memory There is no real dif- 
ference between the stored representation in long-term memory 
and the active representation in working memory. In PDF mod- 
els, though, this is not the case In these models, the patterns 
themselves are not stored Rather, what is stored is the connection 
strengths between units that allow these patterns to be re-created 

[I]f the knowledge is [in] the strengths of the connections, learn- 
ing must be a matter of finding the right connection strengths so 
that the right patterns of activation will be produced under the 
right circumstances. This is an extremely important property of 

^This quotation also facilitates an understanding of the underlying reason for the 
alternative labels connectionism and Parallel Distributed Processing to neural networks 
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this class of models, for it opens up the possibility that an infor- 
mation processing mechanism could learn, as a result of tuning its 
connections, to capture the interdependencies between activations 
that it is exposed to the course of processing . 

[KJnowledge about any individual pattern is not stored in the 
connections of a special unit reserved for that pattern, but is dis- 
tnbuted over the connections among a large number of processing 
units.® (McClelland, Rumelhart, et al, 1986a). 


It is indeed difficult to imagine that two different, and in fact con- 
tradictory, theories, each claiming a Scientific status, and the ability to 
account for (human) intelligence could co-exist without controversies. 
A sociological history of the controversies in artificial intelligence has 
been studied by Mikel Olazaran (1993). Controversies in artificial 
intelligence are not new, and have not been settled.® 

The only commonality between the two approaches to artificial intel- 
ligence, aside from their seeking for an account of similar phenomena 


^This latter statement refers to the controversy of grandmother cells. Readers inter- 
ested in this controversy could refer Hofstadter (1979) 

® Indeed, a careful study of the history of the two dominant approaches to artificial 
intelligence, viz classical AI and neural networks, would reveal the strong undercurrent of 
development of scientific theories through paradigmatic revolutions suggested by Kuhn 
(1962), Schneider (1987), as well as the vacillation between competing (or contesting) 
scientific theories proposed by George Wald (see Scientific American in 1966) The area 
of artificial intelligence provides a very good case for study by students of the philosophy 
(as well as sociology and history) of Science. It is to be noted, however, that in the limited 
focus that a thesis can take, such a study will not be attempted. Rather, the preceding 
quick comparison of the philosophical leanings of these two dominant approaches has 
been provided to help an appreciation of neural networks in the proper perspective. 
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related to intelligence, is that both these approaches attempt at provid- 
ing a physicalist, reductionist account to intelligence. However, the es- 
sential difference between the two approaches is that while classical AI 
seeks to maintain the Cartesian distinction between mental events and 
physical events, whereby the brain (or information processing struc- 
ture) is merely the substrate of intelligence (Pylyshyn, 1984), neu- 
ral networks identifies mental states with brain-states (Churchland, 
1986), however, acknowledging the possibility of using mental states as 
macros for a collection, ie a chain, of brain-states. 

Operationally, classical AI focuses on problems related to (knowl- 
edge) representation and identification of means for processing of rep- 
resented knowledge, generally through rewriting rules prescribed by 
several forms of automata. Thus, Logic and Formal systems play a very 
important role in this approach to automated intelligence. Some of the 
prominent formalisms^® for information processing are (Universal) Tur- 
ing Machines, Generative Grammars (Hopcroft & Ullman, 1989), Fi- 
nite State Machines (Kohavi, 1978), and Normal Algorithms (Ershov 
& Palyutin, 1984). 

The equivalence of these different formalisms has been postulated 
by Church's Thesis (Hopcroft & Ullman, 1989; Lewis & Papadim- 


^®For specialized situations of information processing, as seen, eg, in the case of complex 
interconnected dynamical systems, these general formalisms are not quite convenient. 
Discrete Event Systems (Wonhara, 1989) and Petri Nets (Murata, 1989) have been used 
in formalizing the information processing (typically control mechanism) in the arena of 
networked computing nodes 
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itriou, 1981). Neural networks, on the other hand, is an empirical 
enquiry, aimed at an identification of structures capable of represent- 
ing specified information processing requirements, with the hope that a 
vast knowledge-base of architectures related to information processing 
tasks, would provide reasonable pointers to select an architecture given 
an information processing task.^^ 

The limitations of a mechanized expression of Intelligent behav- 
ior’ commonly understood through the Turing (1981) test have been 
anticipated by Descartes (1960), also see Dennett (1988). 

It is indeed conceivable that a machine could be made so that it 
would utter words, and even words appropnate to the presence of 
physical acts or objects which cause some change in its organs; as, 
for example, if it was touched in some spot that it would ask what 
you wanted to say to it; if in another, that it would cry that it was 
hurt, and so on for similar things. But it could never modify its 
phrases to reply to the sense of whatever was said in its presence, 
as even the most stupid of men can do. 

Technological and philosophical limitations of the two prominent ap- 
proaches, ie, Classical AI and neural networks, to artificial intelligence 

^^This view, however, generates the apprehension and/or hope that, operationally, con- 
nectionist AI ultimately would be indistinguishable from classical AI, unless the knowl- 
edge base of connectionist architectures is sought to be represented by neural networks 
but, this too, in the final analysis, would need a universal neural network that might turn 
out to be no different from the existing notion of Universal Turing Machines 
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have been pointed out by several investigators, A few discussions of 
this important topic have been considered in the literature (see eg, 
Partridge & Wilks (1990)). 


A.3 Nonlinear Signal Processing 

Repeated attempts at incorporating the processes of automation have 
presented new challenges in the area of signal processing-particularly 
pattern recognition, decision-making and filtering (including signal es- 
timation) -not merely in terms of speed or throughput of information 
processing, but also the context in which information processing is to 
be supported. Linear processors, with or without adaptivity, have been 
considered, in the not too distant past, to work, satisfactorily over a 
wide variety of applications, and have also been proved to be optimal 
over the class of all possible filtering operations when the signal to be 
processed is available in the context of an additive (white) Gaussian 
noise (c/, Orfanidis, 1988). 

In recent times, however, linear approaches have been found un- 
satisfactory or inadequate in several application areas (like image and 
speech processing, sonar/radar signal processing, protein identification, 
etc) wherein human beings, with requisite training, have been found 
performing well, though not always at the desired speeds: in these 
applications, neither additivity nor Gaussianness of the noise can be 
assured. Part of the inadequacy is felt due to the fact that linear 
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filtering, which generally involves system identification, requires an 
exhaustive account of the dependence of the output signal on the in- 
put signal, preferably as closed-form expressions. It may not always 
be possible to obtain an exhaustive account of input-output dependen- 
cies, especially when the causes for distortions in the processing are not 
known in sufficient operational detail. 

Linear signal processing typically provides an ability to describe pro- 
cessors either in the domain of signal definition (commonly time and/or 
space) or, equivalently, in the spectral domain and in view of the unique 
property of integral and, thereby, spectral transforms mapping convolu- 
tion in one domain to point-wise multiplication in the other, appropriate 
spectral transforms allow for a reduction in processing complexity. An 
immediate consequence is that in the linear approach to signal process- 
ing, also referred to as conventional signal processing, signals, whether 
analog, or digital, are described parametrically and all processing op- 
erations reduce to either an estimation of (signal) parameter(s) given 
[sufficiently many] observations or a manipulation of parameter(s) in 
the input signal space to obtain the desired kind of output signal space. 

Nonlinear approaches to signal processing have been investigated in 
the literature, with claims of success, in an attempt to overcome limita- 
tions of the linear approach. In both linear and nonlinear approaches, 
processing implies an identification of a neighbourhood- a localized re- 
gion-in the domain of the input signal corresponding to every point in 
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the domain of the output signal, followed by a process of assigning, to 
every point in the output signal (domain), a function of assignments, 
to the input signal, over the corresponding neighbourhood. This de- 
pendence of the output, on localized regions of the input, facilitates a 
context-sensitive processing wherein the contextual information is ex- 
pected to be provided by incorporation of an appropriate neighbourhood 
structure in the signal model. 

While in linear signal processors, identification of neighbourhoods 
as well as evaluation of assignments to the output are linear, no such 
imposition of linearity is made in the case of the nonlinear approaches. 
Historically, nonlinear signal processing has been of considerable inter- 
est since the 1980s, though these schemes were known in mathematical 
and statistical literature earlier. Of several approaches suggested, or- 
der statistics (typically median filtering), Volterra and Wiener series 
based filtering, homomorphic filters, and filtering based on morpho- 
logical approaches, cellular automata or normal algorithms, are most 
popular: the latter two approaches are also termed symbolic signal pro- 
cessing. Except for the symbolic approaches, the others are commonly 
expressed in stochastic terms to tackle signals in noisy contexts. 

Nonlinear signal processing is characterized by a grammatical ap- 
proach, ie, signal spaces are identified with grammars^^ involving a 


^^Neighbourhoods are typically defined through delay or shift operators. 

am stretching the notion of gramipars to rule based methodologies to apply the 
notion to discrete symbol spaces as well as continuous spaces of numbers. 
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collection of production or rewriting rules - essentially formalisms spec- 
ifying the mechanism of deriving assignments to a signal from relevant 
localized regions. Signal processing, in this approach, is considered in 
terms of a transformation of the grammar describing the input signal 
space to a desired grammar applicable to the output signal space. In 
passing, it is important to note that integral (spectral) transforms cap- 
ture transformations between grammars, and hence, while the gram- 
matical approach is not unique to nonlinear processors, it happens to be 
an (attempt at a unified) outlook in the absence of the spectral approach 
in nonlinear processors. The grammatical approach to signal process- 
ing has encouraged a breakaway from the tradition of parametric signal 
representation common in presentations of linear signal processors. 

Linear approaches to signal processing have been, by far, the most 
frequently used. The main attractions of linear signal processors lies 
in the fact that linearity accommodates for ease in implementation and 
analysis, and it is adequate to have adders, multipliers (linear gain 
units) and delay units (or shifters) to implement the processor, in time 
(space) and/or spectral (spatial-frequency) domain. Linearity is also 
useful in unifying several conceptually distinct processing steps allow- 
ing for simplified, and computationally efficient processing steps. The 
concept of linearity having algebraic equivalents, linear signal process- 
ing has been easily extended to situations of processing involving not 
merely the object level signal descriptions (ie, data involving indexed 
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collection of numerical assignments), but also meta level constructs like 
functions and functors {cf, Krishnan, 1981). 

Signal processing, in the tradition of linearity, has now been ex- 
tended to include signals described through sets, signals on topologies, 
signals on lattices, monoidal signal descriptions, and more general al- 
gebraic descriptions. The main emphasis in this approach is to con- 
centrate on the invariances (or symmetries) preserved in the transfor- 
mation, and to analyze, design and realize (synthesize) processors with 
the rich fund of knowledge that a search for structure would provide. 

As signals are, in general, available in environments that involve 
perturbations, the processing by linear filters is optimal when the noise 
mixing with the signal is additive and has Gaussian statistics. When 
the mean square error criterion is used for realizing processors expected 
to handle signals corrupted by (additive Gaussian) noise, closed form 
expressions for the processors are quite easily available, for the pro- 
cessing model as well as the specific choice of parameters to be used 
with such a model. (This, in itself, concisely states all the advantages 
of linear signal processing approaches.) 

In signal processing, an important aspect to be noted is that the 
approach to processor design is inevitably one of approximation (and 
associated representation). As in all other situations of approximation, 

Search based on mean squared error essentially describes the error evolution as a 
trsqectory in The advantage of closed form expressions for processing and parametei 
selection can be expected when the evolution of error is sought in p = 2, 3, . . . 
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here too, we find that the ultimate criterion for acceptability (satisfi- 
cability) of approximation rests in human (or biological) perception -a 
situation that cannot be modeled conclusively forever. In this light, it 
is significant to note that even as we accept a certain level of approx- 
imation, our resolution of discrimination improves with our technolgi- 
cal involvement, thereby necessitating a further desire to better the 
present levels of approximation. This biologically grounded insatiable 
(and tenacious) demand for realizing the ideal has been seen in the 
development of signal processors too, particularly with the advent of 
digital processing technology (which recalls the history of automation). 

A manifestation of the biological aspect of satisficability is seen in 
the qualification of performance of signal processors in various contexts 
of signal availability, particularly the statistics and operational nature 
of noise corruption. More specifically, the performance of linear signal 
processors has, of late, been considered unsatisfactory in the presence of 
noise which is either non-Gaussian, or is not of additive nature. Such 
situations are not very hard to find in the present context wherein 
information processing (through automation and artificial intelligence) 
has begun to pervade almost all aspects of our life. Examples abound 
in the realistic areas of computational vision, speech processing, sonar 
target detection etc. 

Adaptivity in linear signal processors is one of the earliest known 
methods to overcome some of the shortcomings of purely linear ap- 
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proaches. The focus in adaptive signal processing is to consider the 
processors as belonging to a certain a priori chosen parameterized 
model class, and the specific parameters are adapted in accordance 
with the processing requirements, and as the processing is operational. 
In this approach, it is expected that as the processing progresses over 
the incident signal, sufficient enough information would be available to 
characterize the necessary deviation from global linearity. 

In the realization of prediction filters (most popular being that due 
to Kalman), adaptation has been put to good use. The environment is 
modeled in the process of parameter adaptation and the current knowl- 
edge of the environment is used to estimate the signal values to be 
expected in the region(s) yet to be processed (in case of one-dimensional 
signal definition domain with a natural ordering, this is an estimate of 
a future signal sample). For the sake of completeness, it is worthwhile 
to remark that adaptation of parameters, formulated as a search for an 
optimal solution, is generally accomplished by variants of (stochastic) 
gradient descent -common strategies are Recursive Least Squares Ap- 
proach and Least Mean Squares (also called Widrow-HoflD Approach.^® 

Parameter adaptation in linear processors is a fruitful strategy to 
overcome some of the failings of purely linear signal processors. How- 
ever, adaptation brings in limitations of its own, most notable being 
the cumulative effect, compounded by possible amplification, of errors 

A unified study of L* approaches to adaptive parameter selection, in signal processor 
design, has been considered by Chaturvedi (1994). 
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incurred in the processing of initial signal segments. Non-linear ap- 
proaches, originally proposed by Wiener (1958), have been inducted 
into signal processing, largely since 1980s, to overcome the general 
limitations of linear signal processors. 

The belated incorporation of nonlinear approaches in signal pro- 
cessing is easily traced to limitations imposed by technological issues 
in the realization of nonlinear operators, and analytical intractability 
of nonlinear systems, particularly the absence of spectral approaches 
to processor realization, and the inability of grouping processing stages 
into simpler processing modules. Order statistics (Pitas & Venet- 
sanopoulos, 1990), t57pically median filtering homomorphic methods 
morphological approaches and Volterra operators (Schetzen, 1980) are 
some of the salient incorporations of nonlinearity in signal processing. 
Studies into the dynamics of nonlinear systems, particularly (determin- 
istic) chaos (Moon, 1987), have been incorporated in the modeling of 
signals and also of processor characteristics {eg, van der Pol oscillators 
{op cit)). 

Signal processing with neural networks (Lippmann, 1987; Kosko, 
1992a; 1992b; Haykin, 1994) has been given considerable attention 
in the past decade, and filters realized through neural computation 
have been related to stack filters (an important class of nonlinear fil- 
ters, cf, Yin, Astola & Neuvo, 1993b). Neural networks, for several 
reasons, have been considered attractive for processor realization, im- 



422 


Appendix A. Intelligence Information Processing 


portant among these are the approach to processor realization, and 
nature of processing. Given relevant examples of the required process- 
ing, as the necessary (internal) representations can be learnt, neural 
networks provide a framework for processor realization in situations 
wherein knowledge about the processing is not known in formal terms 
(specifically closed form expressions). The approximation provided by 
neural networks has been shown to be a Maximum Aposteriori Esti- 
mate (Golden, 1988). 


A.4 Interpretations in Neural Signal Processing 

Signal processing with neural networks, studied as a comparison of ab- 
stract formalisms, relies on approximating functions as a linear span 
of appropriately chosen basis functions, and layering, in neural net- 
works, provides the necessary framework for design (or synthesis) of 
the required basis functions. In this sense, the process of learning is 
to address the problem of finding appropriate basis functions given the 
nature of the processor through examples in the training set, and, si- 
multaneously, the extent to which these basis functions contribute to 
the desired function is also to be determined. 

As information processing is expected in varied situations of signal 
availability, it is imperative that the potential of neural networks in 
accommodating different meanings to signals and their association be 
studied. In this context, I will concentrate on some of the salient inter- 
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pretations that have been given to the neural information processing 
approach. To begin with, note that inputs, conductances, weighting 
values, and outputs (actions/responses) (xt, Wi, i = 1, 2, . . .n, t/, re- 
spectively), in the formal model of neurons presented in § 2.2 are allowed 
to be signed. Numbers corresponding to input intensities, weights, and 
outputs are commonly constrained to be non-negative, and a histor- 
ical practice has been to partition the inputs as being excitatory or 
inhibitory, on the basis of the influence of isolated inputs on the output. 

While this practice originates in the empirical accounts of biological 
expressions of information processing, it has been a recent tradition, 
stemming from considerations of anal 3 d;ical convenience, to regard val- 
ues (numbers) with positive signs as corresponding to excitatory condi- 
tions and values with the opposite (ie, negative) sign as corresponding 
to inhibitory conditions: situations of inactivity are still identified with 
the origin (ic, zero) of the specific (naturally ordered) numbering sys- 
tem used. A specific consequence of this altered encoding is to permit 
inputs to switch (under conditions of learning and/or adaptation) be- 
tween excitatory and inhibitory modes, possibly in contravention to 
expected biological principles. However, the technological significance 
of accommodating switched input channel modes cannot be ignored. 

An inspection, of the formal model of a neuron under steady state 
conditions of additive dynamics, immediately reveals a framework sim- 
ilar to hypothesis testing (typically sign test), and in this interpretation. 
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we also notice that the use of sigmoidal action function is equivalent 
to a randomization (c/, Lehmann, 1986) of a binary decision unit {ie, 
neuron with hard limiting action function). Neural networks differs 
from statistical hypothesis testing, at a paradigmatic level, in the way 
tests are constructed. The procedure of training, in neural networks, is 
expected to address the manner in which tests for prevailing hypothe- 
ses are constructed, though, no explicit effort (step) at identifying the 
hypotheses is considered necessary. 

Statistical hypothesis testing, based on the ’top-down’ approach 
characteristic of formal systems, places significant emphasis on the 
identification of hypotheses that would constitute a description of mem- 
bers in the relevant input space: the hypotheses, so identified, are used 
as the basis for designing necessary tests that would allow, by means 
of an input space characterization (model), a synthesis of the desired 
processor. As hypothesis testing, estimation and filtering reduce to 
function approximation/synthesis, neural networks provide a common/ 
unified framework for these important facets of signal processing. 

Inter-neural interconnections, the basis of complex behavior in neu- 
roscience, are not compelled to exhibit time-invariance in all instances 
of neural network models. Commonly, in neural network related lit- 
erature, the interconnection strengths are identified with long term 
memory traces, and the neural inputs and outputs {ie, activations) 
with short term memory traces (cf, Grossberg, 1988). Time varying 
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interconnection strengths are of interest in studies involving adapta- 
tion of interconnection strengths (as, eg, in Adaptive Resonance Theory 
of Carpenter & Grossberg, 1987a), and investigations, by computa- 
tional bio-physicists, aimed at developing theories capable of a plausible 
account of learning in biological systems. 

A certain degree of reluctance is exhibited, in the present trend of 
research in artificial neural networks, in accepting the interconnection 
strengths on the same footing as programs of digital computers. How- 
ever, as the engineering relevance of neural networks is increasingly 
being appreciated, it would not be long before this status is indeed 
imputed to interconnection strengths, and rather than the present at- 
tempts to realize functions on a global time-scale, time-slicing (ie, time- 
sharing), familiar in the (nearly) simultaneous usage of digital comput- 
ers by several users, would dominate the mode of function realization. 
Each time-slice is associated with an appropriate set of interconnec- 
tion strengths, thereby allowing for efficient, time-localized, function 
realization and the switching of context between time-slices, based on 
past history, would then be an interesting problem to be tackled in the 
neural paradigm.^® 

Action/categorization functions play a very important role in the per- 
formance, and, consequently, the taxonomy of neural networks. Histor- 


^®This approach, while imniediately advantageous in the utilization of neural net- 
works, might be of relevance as a viable, and plausible, model of functioning in biological 
information processing substrate (typically brain). 
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ically, the hardlimiter function has been used for discrimination since 
the pioneering work of McCulloch & Pitts (1943) and is common 
in discourses relating the function of neural networks to formulae of 
propositional calculus, ie, functions of (crisp) Boolean logic. 

Neurons with hardlimiter action functions have been studied as 
threshold logic during the 1960s {cf, Cover, 1965; Hurst, 1971) at a 
time when neural networks had not been assigned the present sta- 
tus of popularity/notoriety. Graded (monotonic) response, as provided 
by sigmoid functions, have been incorporated into the functionality of 
(recurrent) neural networks by Hopfield (1984), and have been consid- 
ered essential for the automatic specification of parameters (weights) in 
multi-layered neural networks. (See Rumelhart, Hinton & Williams, 
1986; McClelland, Rumelhart, et al, 1986a; 1986b; Matheus & Ho- 
hensee, 1987; Hinton, 1989; Soucek, 1992 for a discussion on learn- 
ing in multi-layered neural networks through procedures involving a 
backpropagation of errors.) 

As the sigmoidal action function has come to be accepted in the 
neural network research community, the function provided by networks 
of neurons incorporating sigmoidal action functions have been related 
to formulae of fuzzy logic. The monotonicity in discrimination provided 
by hardlimiter and sigmoid action functions have been held responsible 
for the inability of neural networks (with a single hidden layer, eg, 
perceptrons) to satisfactorily approximate desired class membership . 
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functions, and suggestions for overcoming this limitation have been 
made in terms of non-monotonic action functions, typically radial basis 
functions (c/", Poggio & Girosi, 1990). 

Synaptic transmission, in real world neurons, is generally noisy, 
and with this consideration the rate of neural firing, depicted by the 
action/categorization function, has been expressed in the literature as 
being probabilistically related to the membrane potential, In the case 
of binary neurons, the probability of the neuron firing (at the higher of 
two frequencies, ie, j/ = Ci) is given by 

P{y = Cl \vx, t) = cr(? 7 (x, t)) , (A.1) 

P{y = Cohfet)) = 1 -P(y = Cil’ifet)), 

where, the function cr is generally of the sigmoidal type with [C- , C+] = 
[0, 1] (see eg, Peretto, 1992). In this thesis, cr denotes an abstract map- 
ping of the membrane potential to the neural response: depending on 
the context, this notation will be interpreted as a deterministic decision 
function or a probabilistic appraisal of the (binary) neural decision. 

Interpretation of neurons in stochastic terms has, however, not 
been limited to consideration of synaptic transmission noise. As neu- 
rons inherently support dynamics, and accommodate for excitatory and 
inhibitory inputs, stochastic dynamical systems, in particular, birth- 
death processes, are not infeasible. Independent Poisson streams of 
discrete pulses (inspired by models of spike trains along axons and 
dendritic arborescence) have been identified, in the literature, with the 
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excitatory and inhibitory inputs, though with different (arrival) rates, 
and the equilibrium rate of the (non Poisson) output pulse stream (ze, 
neural activity level) are related with those of inputs, 

This simple demonstration relating neurons with stochastic dy- 
namical systems suggests the feasibility of applying the paradigm of 
neural networks (with associated aspects of learning and generaliza- 
tion) to the study of networks of queues common in analysis, and de- 
sign of distributed processing systems, and high-speed communication 
networks.^® Pursuing this line of reasoning, it might not be infeasible to 
incorporate neural networks in studies of stochastic decision systems, 
in particular Markov decision processes (c/, Derman, 1970). Hidden 
Markov Models, frequently used in Classical AI for speech recognition, 
too relate in this sense to neural networks, and studies linking the two 
function synthesis approaches have been reported in the literature. 


^'^In terms of the notations introduced earlier, the inputs x,, i = 1, 2, . . . n, are Poisson 
distributed pulse streams (with mutual independence); the interconnection strengths s,*, 
I = 1 , 2 , , . .n, (and consequently, the weights Wt, i = 1 , 2 , . . .n) control the nature of 
mixing of input streams to influence the membrane potential ^ 7 , which plays the role of 
an internal state variable; the abstract amplification and translation functions a, and 
b, respectively control the (statistical) feedback; and the action function cr relates the 
equilibrium distributions of the state variable rj and the output y. 

See Kleinrock, 1975; Bertsekas & Gallager, 1987; Walrand, 1988 for an analytical 
treatment of networks of queues, and their role in distributed processing systems and 
high-speed communication networks. 
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Notations relevant to all chapters 
Relations 


< 

A 

V 

3 

Spaces 

0 

9 ?” 

5R+ 


A generic partial ordering relation. 

The relation less than or equal to. 

Equal by definition. 

The universal quantifier. 

The existential quantifier. 

The empty set. 

The real number field. 

Vector Space of n-tuples of real numbers. 

Collection of positive reals in 9^, ic, 5R+ = {a;|x E 5R and x > 0}. 
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Notations 


A' 


y 


X 

2 ) 

eiA) 


L^iA) 


C(A) 


C°°(A) 

Variables 


hJ 

n 

X 


Collection of inputs available for processing. It is common 
to find /Y C for an appropriate value of rt, rr = 0, 1 , . . . 

Collection of labels, or values, to be output as a result of 
processing. The space y is, in general, a compact subspace 
of for an appropriate value of rn, tu = 0, 1, 

Collection of input signals. 

Collection of output signals. 

The collection of sequences on the (support) set A that are 
summable in the p-th power of the absolute values. When 
the support set is the real number field, this collection 
is denoted by F. is the collection of square summable 
sequences on 3f?.) 

The collection of functions on the (support) set A that are 
integrable in the p-th power of the absolute values. When 
the support set is the real number field, this collection 
is denoted by L^. (L^ is the collection of square integrable 
sequences on dt ) 

The collection of continuous functions defined on the space 
> 1 . 

The class of analytic functions defined on the space A. 


Generic indexing variables 

Dimensionality of (input) space n = 0, 1, ... 

A vector denoting the collection of elements to be processed 
upon, xeX. 
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!1 

y 

w 

W 

C-.C+ 

t,v 

Constants 

0 


A vector denoting the collection of elements that represent 
an intermediate level of processing, 7 ; G for an appropri- 
ate value of 7n, m = 0, 1, . . . 

A vector denoting the collection of elements that form the 
outputs of the processor, y£y. 

A weight vector denoting the collection of elements that op- 
erate on corresponding channels in a neuron, G where 
n refers to the number of channels in the neuron: the di- 
mensionality of the collection of input patterns, A', incident 
on a neuron is assumed to be the same as the number of 
input channels in the neuron. 

A weight matrix corresponding to a layer’ of neurons. W = 

^ = 0, 1, . , are 

the weights vectors of the distinct neurons in the layer. All 
neurons in a layer’ are assumed to have an identical number 
of input channels to simplify symbolization and analysis 

l^imits of a connected interval denoting the acceptable val- 
ues of outputs. C--1 C+ ^ ^ such that < C+* 

An (independent) variable with the connotations of time, 
t G [0, - 1 - 00 ). When the time travel is restricted to discrete 
spaces, the notation used is u. The variable 1 / is restricted 
to a space that is in one-one correspondence with the set of 
naturals 0, 1, ... 


The zero vector. Dimension is to be read from the context. 
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1 The vector of ones. Dimension is context dependent. 

Functions 

re A function denoting the signal to be processed, x e 3^. 

y A function denoting the result of processing, y 6^. 

IX Measure, generally Lesbegue. 

a^arhiCTs^erg Activation function. The commonly encountered types of 

activation functions are the hard-limiter (<7h), sigmoidal (crj 
and Gaussian (ag) functions. 

Operations 

w X Inner-product (dot product) between vectors w and x. The 

vectors w are assumed to belong to a common (Hilbert) space 
of patterns. 

{wj x) Inner-product (dot product) between functions (signals) w 

and X. The functions w and x are assumed to belong to a 
common (Hilbert) space of signals 

I I Cardinality, when interpreted on sets. 

Metric, when interpreted on functions. 

The distinction should be clear from the context. 

11*11 Norm of a member (vector, function, etc) of an appropriate 

vector space When the space, say X, is specifically indi- 
cated the norm is indicated by || ||;^. Where necessary, the 
measure, say /x, through which the norm is defined is ex- 
plicitly indicated by the qualification 11*11;^ [ix], 

0|5 Restriction of an operation, O, to a region, «S, smaller than 
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the domain of O. SC Vp, where Vo is the domain of the 
operation O. 

A Closure of the set A. 

\ Set difference. 

P(X = x) The probability of the random variable X taking on an in- 

stance X. 

V The binary operation of supremum (maximum) of the given 

(two) arguments. 

A The binary operation of infimum (minimum) of the given 

(two) arguments. 

o The binary operation of function composition. 

Notations defined in Chapter 2 

Notations defined in Section 2 

5 Domain of definition of the signal x. Scanning of the signal 
over this domain is indicated by the progression of ^ € .S'. 

6 Domain of definition of the signal y. Scanning of the signal 
over this domain is indicated by the progression of ^ € O- 

X Range space of the signal x. 

y Range space of the signal y. 

2l» Algebraic structure (of appropriate kind) on , the space 

of all signsds from H to X. (SI* contains subsets of X '~' .) 

Algebraic structure (of appropriate kind) on the space 
of all signals from G to y. (2ty contains subsets of 3^^.) 

^In this section, the symbols X, y, 6, 0, ^ and E have an interpretation different from 

that in the rest of the thesis. 
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Space of measurements on signal x. (Possibly the same as 

21 ..) 

Space of measurements on signal y. (Possibly the same as 

2t,.) 

(0) Neighbourhood structure in S at the scan position 6, (0) C 

S for all 6 e O, 

J\fy {9) Neighbourhood structure in 0 at the scan position 0, Uy{9)Q 

0 for all 0 £0. 

Ux (^) Assignments of x over A4 (^). (Note that Ux {0) is a signal 

in and also identifies an ordered subset (ordered by 

a; {6)) of A'- such that V<9 e e a. {0) € 21. .) 

ay {6) ALSsignments of y over My (6). (Note that ay{0) is a signal 

in and also identifies an ordered subset (ordered by 

My {$)) of such that V9 e 0 Oy (9) G 2ty .) 

4) Indexed collection of measures^ on 5Sx, indexed hy 9 G 0, 

4} Indexed collection of measures on 2ty, indexed hy 9 G0^ 

f Mechanism (method) by which the evaluation of assign- 

ments to y are arrived at. This includes correlations be- 
tween signals x and y. 

s Measure of mismatch (ie, (un)satisficability). 

6 The repertoire of distinct labels (possibly numbers) used to 

distinguish the possible mismatches. 

^More precisely, <!> and ip are product measures on the algebraic structures 21. and 

2ty respectively. These functions measure an appropriate (desired) aspect of the relative 

organization of assignments in the signals x and y, ie, <p and ip are predicates compatible 

to signals x and y. 
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The desired form of signal on processing. 

g An appropriate function (possibly incorporating -0) specify- 

ing the idealized (or expected) form of processing needed. 

p The mechanism by which comparison between the output 

signal and the desired or idealized signal forms is achieved 
In functional analytic terms, p is, generally, a (semi)metric, 
ie, a metric (distance function) with the axiom of unsigned- 
ness relaxed. 

ro A window function. 

b A T)asic' wavelet window, 

(p A scaling function. 

Notations defined in Section 2 2 

rj Potential accumulated on the membrane of a neuron (ie, 

neuron state, also termed as post S3maptic potential), rj £ 

Xx Activity (input) on (dendritic) channel t, i = 1, 2, ... n, n 

being the number of channels, and xi e 

a An abstract amplification function indicating the mecha- 

nism of modulation (decay) of the membrane potential r/, 
a: 5^ with the restriction that a takes non-negative 

values for reasons of stability 

b An abstract translation function specifying the extent of 

state translation in the dynamics of the membrane potential 
t;, b: 

St Interconnection strength, or synaptic efficacy, of channel i, 

i = 1,2,. .Ti, St € 
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y 


a 


c 

R 

R7^ 

I 


t = RC 


Wt = 




e=:Ri 

L 


Action/response of the neuron, physiologically associated 
with the frequency of axonal spike generation, y € [(- , ( 4 .] c 
^ for neurons with continuous valued outputs with appro- 
priate values for C- and (4, or y e {Co, Ci) • • • Cc}, with a 
priori values Cj € 5 R, 7 = 0, 1, . . c, c = 1 , 2 , . . . being (one 
less than) the number of categories, for neurons with dis- 
crete valued outputs. 

Activation function mapping the membrane potential 7 / to 
the response (axonal spike frequency) y, generally using a 
non-linear method (possibly with a provision for refractory 
time), cr: 0^ -+ [C-,C+] for continuous valued neurons and 
cr: 0^ — > {Co, Cl , • * • Cc} for discrete valued neurons. 

Membrane capacitance (constant amplification in conduc- 
tance model), C > 0. 

Membrane (leakage) resistance (linear translation in con- 
ductance model), 0 < jR < 00 . 

Conductance of channel i (connection strength in conduc- 
tance model), 0 < \R^^\ < 00 , t = 1, 2, . . . n. 

Current applied externally (static translation in conduc- 
tance model), / € SR 

Membrane charge-discharge time constant, 0 < r < 00 . 

Weighting value associated with channel i, Wi £ % i = 
1, 2, ... n 

Threshold of firing, ^ € SR. 

Number of layers in the network. 
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mi 



%(«) 



r(0 

.r 


Number of processing nodes (ie, neurons) in layer t, t = 

1.2.. .L. 

Feed-through synaptic efficacies {ie, inter-layer interconnec- 
tion strengths) for processing node in layer = 

1 . 2 .. ., mt, £ = 1 , 2 ,. L. 

Feed-back (recurrent) synaptic efficacies {ie, intra-layer in- 
terconnection strengths) for processing node in layer £, 
=1,2,. .m^,£ = 1,2,...L. 

Propagation (and refractory) delay in the feed-through path 
from processing node i / of layer (£ — 1) to processing node 7 
in layer £, 2 / = 1)2, . m^_i, = 1, 2, . . . m/, £ = 1, 2, . . . L 

Propagation (and refractory) delay in the feed-back path 
from processing node ir to processing node both in layer 
£, ir = 1,2,... m/, = 1, 2, . . . m^, £ = 1, 2, . .L. 


Notation defined in Section 2 3 

{x) A homogeneous pol3momial of degree i in the elements of 

the vector x. 


Notations defined in Chapter 3 
Notations defined in Section 3.1 

Collection of all n dimensional binary vectors (the elements 
are in [—1, -f 1]). This is also referred to as the Boolean space 
of n dimensions. 

((^, Generalized Boolean space of n dimensions derived by scal- 

ing and translating B". ( e is the scale factor and 
3? 6 9^"^ denotes the extent of translation. 
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Linear sub-space in in the direction of w. 

{fSw\/3 

Pn Collection of preservance weights corresponding to a dis- 

crete input space of n dimensions. 

Pn (a) Restriction of Pn to weights having a as the common factor 

of all elements. 

Enumerated preservance weight (c is the enumeration in- 
dex). 

Pt 2 ti A permutation operation relating the transformation of a 

preservance weight e i > preservance weight > , 

^ Pn(o:), for some a € 

{(y£) A discrete subset of . 

Vri^y:^) A discrete subset of rank r, r = 1,2, . . in dT' useful in 

the study of input space preservation in isolated neurons. 
C G 5R+ is the scale factor and i? G is the translation. 
Vx iCyl) = (C, t?) for all n = 1, 2, . C € and t? G 

£^(a, (C, 1 ?)) Preservation points in under a weight w corresponding 

toB"(C,^). 

£te.(a, Vr (C» 3^)) Preservation points in under a weight w corresponding 
to P”‘(C,32)* When the norm of the preservance weights is 
fixed, then a is replaced by ||t£||. The resulting notation is 

£-(INI,^r(C,^)). 

Notations defined in Section 3.2 
$ 


Value of threshold in a comparison. 


Notations 


439 


Notations defined 

£„(a, %) 

w A 

Notations defined 


The smallest (connected) interval in 5 R between the i th and 
i + 1 th ordered points in (a, Vr (Cj J^))- 

in Section 3.3 

Collection ofinputs listed in the training set. % CVr {(>,'£} • 
For the paradigm of learning by examples to be meaningful, 

Preservation points in of T, under a preservance weight 
€ Pn {oi) for any a e 5 R+. 

Dichotomy on V? (C, :^)- 

The collection of projections of the vectors in the set A along 
the vector vi, le, w- A — {yn^lxeA}. The vector w and 
the vectors in A belong to a common innerproduct (Hilbert) 
space. 

n Section 3 4 

The totality of (scaled and shifted) radix r vectors in 
r = 2, 3, . . . C € 5R4. is the scale factor and 3? € is the 
translation. 

A discrete subset of rank r, r = 1,2, . . in useful in 
the study of preservation of an input space of radix r, r = 
2, 3, . . , vectors in isolated neurons. C ^ scale 

factor and ^ € 9 ?” is the translation. 

The discrete space cP? «, 3?) rotated in such a way that any 
vector in Cw \ { 0 } is a preservance weight of the discrete 
input space. 
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Notations defined in Chapter 4 
Notations defined in Section 4 1 

t) (x) The response of a single layer neural signal processor oper- 

ating on an input pattern x. For all ^ € A' C t)(x) e 

t) An array denoting the responses of a single layer neural 

signal processor to the input vectors in the discrete space 

y A matrix denoting the responses of the neurons participat- 

ing in a single layer neural signal processor for inputs in 
the preservance input space (Cj 

A* The Moore-Penrose pseudo inverse of a matrix A. 

Notations defined in Section 4 2 

The response in layer £ of a multi-layered neural signal 
processor operating on an input pattern x. For all x € Af C 

The weight vector associated with the j^^^-th processing 
node in layer i. 

The threshold associated with the -th processing node in 
layer L 

An array denoting the responses in the fc-th layer, k == 
1, 2, , of a k layer neural signal processor to the input 
vectors in the discrete space f 7^^ (C, t7). 

A matrix denoting the responses of the neurons in the k- 
th layer participating in a layer neural signal processor 
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for inputs in the preservance input space The 

number of sign-transitions in the responses of individual 
neurons are restricted to be no more than 

Notations defined in Section 4.4 

^ (^) The partition induced on the set A. 

(-4) The distinct members of the partition on the set .4, t = 

1 , 2 , ..|^( 4 )|. 

Notations defined in Chapter 5 

Notations defined in Section 5.1 

A type-fc neural signal processor (indicated by the response). 

^91 The collection of functions realized by a neural signal pro- 

cessor of type- A:. 

^ 01 (i) The collection of functions realized by a type-A; neural signal 

processor, as a consequence of possible evolution, at time 

c ^ 91 . 

(^) collection of evolutionary neural functions realized by 

type-A: neural signal processors whose nodes are as specified 
in the vector 

C An arbitrary compact subset of 

^ j TT Component functions in the function representation scheme 

of Kolmogorov. 

Notations defined in Section 5 2 

5, ""5^, “if A foliation on the input space. The prefixes ’m,' 'd’ and 

denote foliations due to measurement, discrimination and 
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aggregation operations, respectively. A foliation is essen- 
tially a partition wherein all members are indexed. Each 
(indexed) member of the partition is termed a leaf of the 
foliation. 

A:^ Stem of the foliation 3^. This is the index set for the leaves 

of the foliation 3. 

Notations defined in Section 5.3 

X A function(al) denoting the pattern incident on the decision 

unit(s), x: S such that a: (^)' 0 -- 4 . A' for all ^ £ S, 

y A function(al) denoting the response pattern of the neural 

signal processor, y: S such that y{0- ^ x ‘7^0,4- 3^ 

for all ^ £ Sy 

w Weight function in a neuron defined on a function space X. 

w £ Wy a. function space which is embedded in the same 
Hilbert space as X. 

V Combining function in a neural signal processor defined on 

a function space 3£. v £ W, a function space which is em- 
bedded in the same Hilbert space as X. 

Kw , Ke , Kv Kernels of the integral transforms of measurement and ag- 
gregation. Kw is the kernel of the integral transform due 
to feed-through associations in the measurement process, 
while Kf: is the kernel of the integral transform of mea- 
surement due to lateral interaction. Kv is the kernel of the 
integral transform due to aggregation. 
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Notations defined in Chapter 6 

Notations defined in Section 6 1 

D The operator of ordinary differentiation with respect to the 

independent variable. 

/ The Fourier transform of a function fyf€L^. 

uj A variable whose reciprocal has the connotations of recur- 

rence (periodicity). 

r An independent variable denoting traversal in the domain 

of definition of the weighting functions w. 

Notations defined in Section 6 2 

Ordinary differentiation operator of order j = 1, 2, ... 

Cxa A one-dimensional linear subspace in the ^direction' of the 

pattern (signal) Xa,Xa € X. 

Di^ The operation of directional differentiation of order j, j = 

1, 2, . , in the direction of the pattern (signal) Xa, Xa € X, 
||x«|| = l. 

Notations defined in Section 6.3 

'Jj, Region of localization in the domain of definition of the func- 

tion /. 

Width of the region of localization in the domain of definition 
of the function /. 

Notations defined in Section 6.4 

{'?}?=! ^ collection of n, n = 1, 2, . , linearly independent func- 

tions. The kernel of a reproducing kernel Hilbert space is 
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represented as certain linear combinations of these ’basis’ 
functions. 

S The Dirac delta function. 

n The bandwidth of band-limited signals (concepts), 

(*) Sampling functions in layer £ associated with the integral 

transform of (feed-through) measurement. 

w<l>[Pi) A collection of linearly independent functions that are used 

to realize self-reproducing kernels of the integral trans- 
forms of measurement due to feedthrough associations in 
the f-th layer of a multi-layer neural signal processor, = 
1 , 2 ,. , vjN^^K = 1 , 2 , — These functions are termed 
’reproducing feedthrough measurement kernel basis’ func- 
tions. 

A collection of linearly independent functions that are used 
to realize self-reproducing kernels of the integral trans- 
forms of measurement due to lateral interactions in the 
^-th layer of a multi-layer neural signal processor, = 
1, 2 , . = 1, 2 , . . . These functions are termed 

’reproducing lateral measurement kernel basis’ functions. 

A collection of linearly independent functions that are used 
to realize self-reproducing kernels of the integral trans- 
forms of aggregation in the £-th layer of a multi-layer neural 

signal processor, = 1, 2, . . . = 1,2, These 

functions are termed ’reproducing aggregation kernel basis’ 
functions. 
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U) (f) 

h *2 


Q{t) 

^P^An 




(0,(0 

1 '2 


it 


X 


a ^,(0 


The coefficients of the sum of products of Veproducing feed- 
through measurement kernel basis’ functions that are used 
to synthesize the kernels of the integral transforms of mea- 
surement due to feedthrough associations. The values of 
these coefficients are given by the inverse of the Gramm 
matrix constructed from the mutual innerproducts of the 
’reproducing feedthrough measurement kernel basis’ func- 
tions. 

The coefficients of the sum of products of ’reproducing lat- 
eral measurement kernel basis’ functions that are used to 
synthesize the kernels of the integral transforms of mea- 
surement due to lateral interactions. The values of these 
coefficients are given by the inverse of the Gramm matrix 
constructed from the mutual innerproducts of the ’repro- 
ducing lateral measurement kernel basis’ functions. 

The coefficients of the sum of products of ’reproducing ag- 
gregation kernel basis’ functions that are used to synthesize 
the kernels of the integral transforms of aggregation. The 
values of these coefficients are given by the inverse of the 
Gramm matrix constructed from the mutual innerproducts 
of the ’reproducing aggregation kernel basis’ functions. 

The collection of concepts (signals) over which an appropri- 
ate neural signal processor is defined to realize the ’repro- 
ducing kernel basis’ functions. 

Concepts (signals) in 

Directions in the concept (signal) space X, The directional 
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z 




derivatives, along these directions, of an appropriately cho- 
sen neural signal processor are used as the ’reproducing 
kernel basis’ functions. 

The neural signal processor (scalar, non-evolutionary and 
type-1) over 30 whose directional derivatives are chosen as 
the ’reproducing kernel basis’ functions. 

A denumerable collection of integers. 

Functions defined to choose the scale factors of the ’repro- 
ducing kernel basis’ functions in layer L Other entities of a 
similar nature are and The prefixes ’it?’, 

’e’ and *v denote the scaling associated with the basis func- 
tions of the kernels of measurement due to feedthrough, 
measurement due to lateral interaction and aggregation, 
respectively. 

Functions defined to choose the translations of the ’repro- 
ducing kernel basis’ functions in layer t Other entities of a 
similar nature are and The prefixes ’to’, 

’e’ and ’v’ denote the shifts associated with the basis func- 
tions of the kernels of measurement due to feedthrough, 
measurement due to lateral interaction and aggregation, 
respectively. 
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